Off-policy Model-based Learning under Unknown Factored Dynamics

作者: Assaf Hallak , Francois Schnitzler , Timothy Mann , Shie Mannor

DOI:

关键词:

摘要: Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy better than the one use. But how can we prove superiority without testing policy? To answer this question, introduce G-SCOPE algorithm evaluates based on data generated by existing policy. Our both computationally and sample efficient because it greedily learns to exploit factored structure dynamics of environment. We present finite analysis our approach show through experiments scales well high-dimensional with few samples.

参考文章(27)
Dale Schuurmans, Carlos Guestrin, Relu Patrascu, Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs international conference on machine learning. pp. 235- 242 ,(2002)
Gadiel Seroussi, Erik Ordentlich, Sergio Verdu, Tsachy Weissman, Marcelo J. Weinberger, Inequalities for the L1 Deviation of the Empirical Distribution ,(2003)
Sham Machandranath Kakade, On the Sample Complexity of Reinforcement Learning Doctoral thesis, UCL (University College London).. ,(2003)
Csaba Szepesvári, Rémi Munos, Lihong Li, On Minimax Optimal Offline Policy Evaluation. arXiv: Artificial Intelligence. ,(2014)
Thomas G. Dietterich, The MAXQ Method for Hierarchical Reinforcement Learning international conference on machine learning. pp. 118- 126 ,(1998)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Alexander L. Strehl, Michael L. Littman, Carlos Diuk, Efficient structure learning in factored-state MDPs national conference on artificial intelligence. pp. 645- 650 ,(2007)
Michael J. Kearns, Daphne Koller, Efficient reinforcement learning in factored MDPs international joint conference on artificial intelligence. pp. 740- 747 ,(1999)