How hard is my MDP?" The distribution-norm to the rescue"

作者: Odalric-Ambrym Maillard , Timothy A Mann , Shie Mannor

DOI:

关键词:

摘要: … RL in an MDP where p represents the transition kernel of a station-action pair and f represents the value function of the MDP for a … Our formal notion of MDP hardness is summarized in …

参考文章(22)
Gadiel Seroussi, Erik Ordentlich, Sergio Verdu, Tsachy Weissman, Marcelo J. Weinberger, Inequalities for the L1 Deviation of the Empirical Distribution ,(2003)
Aviv Tamar, Dotan Di Castro, Shie Mannor, Temporal Difference Methods for the Variance of the Reward To Go international conference on machine learning. pp. 495- 503 ,(2013)
Sham Machandranath Kakade, On the Sample Complexity of Reinforcement Learning Doctoral thesis, UCL (University College London).. ,(2003)
Daniel Mankowitz, Timothy Mann, Shie Mannor, Time-Regularized Interrupting Options (TRIO) international conference on machine learning. pp. 1350- 1358 ,(2014)
Thomas G. Dietterich, The MAXQ Method for Hierarchical Reinforcement Learning international conference on machine learning. pp. 118- 126 ,(1998)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Istvan Szita, Csaba Szepesv ri, Model-based reinforcement learning with nearly tight exploration complexity bounds international conference on machine learning. pp. 1031- 1038 ,(2010)
Peter L. Bartlett, Ambuj Tewari, REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs uncertainty in artificial intelligence. pp. 35- 42 ,(2009)
Massimiliano Pontil, Andreas Maurer, Empirical Bernstein Bounds and Sample Variance Penalization conference on learning theory. ,(2009)
Ronald Ortner, Peter Auer, Thomas Jaksch, Near-optimal Regret Bounds for Reinforcement Learning Journal of Machine Learning Research. ,vol. 11, pp. 1563- 1600 ,(2010)