Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

作者: Quanquan Gu , Dongruo Zhou , Jiafan He

DOI:

关键词: MinimaxMarkov decision processComputer scienceUncertainty principleRegretUpper and lower boundsReinforcement learningCombinatoricsDiscountingLogarithm

摘要: … We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) … factors, which suggests that UCBVI-γ is nearly minimax optimal for discounted MDPs. …

参考文章(31)
Sham Machandranath Kakade, On the Sample Complexity of Reinforcement Learning Doctoral thesis, UCL (University College London).. ,(2003)
Istvan Szita, Csaba Szepesv ri, Model-based reinforcement learning with nearly tight exploration complexity bounds international conference on machine learning. pp. 1031- 1038 ,(2010)
Nicolo Cesa-Bianchi, Gabor Lugosi, Prediction, learning, and games ,(2006)
Massimiliano Pontil, Andreas Maurer, Empirical Bernstein Bounds and Sample Variance Penalization conference on learning theory. ,(2009)
Ronald Ortner, Peter Auer, Thomas Jaksch, Near-optimal Regret Bounds for Reinforcement Learning Journal of Machine Learning Research. ,vol. 11, pp. 1563- 1600 ,(2010)
Tor Lattimore, Marcus Hutter, PAC bounds for discounted MDPs algorithmic learning theory. pp. 320- 334 ,(2012) , 10.1007/978-3-642-34106-9_26
Alexander L. Strehl, Michael L. Littman, An analysis of model-based Interval Estimation for Markov Decision Processes Journal of Computer and System Sciences. ,vol. 74, pp. 1309- 1331 ,(2008) , 10.1016/J.JCSS.2007.08.009
Mohammad Gheshlaghi Azar, Rémi Munos, Hilbert J. Kappen, Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model Machine Learning. ,vol. 91, pp. 325- 349 ,(2013) , 10.1007/S10994-013-5368-1
Satinder P. Singh, Michael J. Kearns, Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms neural information processing systems. ,vol. 11, pp. 996- 1002 ,(1998)