Exploration Driven by an Optimistic Bellman Equation

作者: Samuele Tosatto , Carlo D'Eramo , Joni Pajarinen , Marcello Restelli , Jan Peters

DOI: 10.1109/IJCNN.2019.8851736

关键词:

摘要: Exploring high-dimensional state spaces and finding sparse rewards are central problems in reinforcement learning. Exploration strategies are frequently either naïve (eg, simplistic …

参考文章(37)
Katharina Mülling, Jan Peters, Yasemin Altün, Relative entropy policy search national conference on artificial intelligence. pp. 1607- 1612 ,(2010)
Nicolas Meuleau, Paul Bourgine, Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty Machine Learning. ,vol. 35, pp. 117- 154 ,(1999) , 10.1023/A:1007541107674
Malcolm J. A. Strens, A Bayesian Framework for Reinforcement Learning international conference on machine learning. pp. 943- 950 ,(2000)
S. I. Marcus, E. Fernández-Gaucherand, D. Hernández-Hernández, S. Coraluppi, P. Fard, Risk Sensitive Markov Decision Processes Birkhäuser, Boston, MA. pp. 263- 279 ,(1997) , 10.1007/978-1-4612-4120-1_14
Richard Dearden, Nir Friedman, Stuart Russell, Bayesian Q-learning national conference on artificial intelligence. pp. 761- 768 ,(1998)
John Schulman, None, Trust Region Policy Optimization international conference on machine learning. pp. 1889- 1897 ,(2015)
Ronald Ortner, Peter Auer, Thomas Jaksch, Near-optimal Regret Bounds for Reinforcement Learning Journal of Machine Learning Research. ,vol. 11, pp. 1563- 1600 ,(2010)
Nikos Vlassis, Mohammad Ghavamzadeh, Shie Mannor, Pascal Poupart, Bayesian Reinforcement Learning Reinforcement Learning. pp. 359- 386 ,(2012) , 10.1007/978-3-642-27645-3_11