Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

作者： Quanquan Gu , Dongruo Zhou , Jiafan He

DOI:

关键词: Minimax 、 Markov decision process 、 Computer science 、 Uncertainty principle 、 Regret 、 Upper and lower bounds 、 Reinforcement learning 、 Combinatorics 、 Discounting 、 Logarithm

摘要: … We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) … factors, which suggests that UCBVI-γ is nearly minimax optimal for discounted MDPs. …

参考文章(31)

Sham Machandranath Kakade, On the Sample Complexity of Reinforcement Learning Doctoral thesis, UCL (University College London).. ,(2003)

R-max - a general polynomial time algorithm for near-optimal reinforcement learning Journal of Machine Learning Research. ,vol. 3, pp. 213- 231 ,(2003) , 10.1162/153244303765208377

Istvan Szita, Csaba Szepesv ri, Model-based reinforcement learning with nearly tight exploration complexity bounds international conference on machine learning. pp. 1031- 1038 ,(2010)

Nicolo Cesa-Bianchi, Gabor Lugosi, Prediction, learning, and games ,(2006)

Massimiliano Pontil, Andreas Maurer, Empirical Bernstein Bounds and Sample Variance Penalization conference on learning theory. ,(2009)

Ronald Ortner, Peter Auer, Thomas Jaksch, Near-optimal Regret Bounds for Reinforcement Learning Journal of Machine Learning Research. ,vol. 11, pp. 1563- 1600 ,(2010)

Tor Lattimore, Marcus Hutter, PAC bounds for discounted MDPs algorithmic learning theory. pp. 320- 334 ,(2012) , 10.1007/978-3-642-34106-9_26

Alexander L. Strehl, Michael L. Littman, An analysis of model-based Interval Estimation for Markov Decision Processes Journal of Computer and System Sciences. ,vol. 74, pp. 1309- 1331 ,(2008) , 10.1016/J.JCSS.2007.08.009

Mohammad Gheshlaghi Azar, Rémi Munos, Hilbert J. Kappen, Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model Machine Learning. ,vol. 91, pp. 325- 349 ,(2013) , 10.1007/S10994-013-5368-1

10.

Satinder P. Singh, Michael J. Kearns, Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms neural information processing systems. ,vol. 11, pp. 996- 1002 ,(1998)

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

来源期刊

我的账户

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

来源期刊

相似文章 0

我的账户