Data-Efficient Policy Evaluation Through Behavior Policy Search

作者: Scott Niekum , Peter Stone , Philip S. Thomas , Josiah P. Hanna

DOI:

关键词: Mean squared errorMarkov decision processTask (project management)Search algorithmComputer scienceEconometricsExpression (mathematics)

摘要: We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its …

参考文章(17)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)
Julio da Motta Singer, Pranab Kumar Sen, Large Sample Methods in Statistics: An Introduction With Applications ,(1993)
Dimitri P. Bertsekas, John N. Tsitsiklis, Gradient Convergence in Gradient methods with Errors Siam Journal on Optimization. ,vol. 10, pp. 627- 642 ,(1999) , 10.1137/S1052623497331063
Jordan Frank, Shie Mannor, Doina Precup, Reinforcement learning in the presence of rare events Proceedings of the 25th international conference on Machine learning - ICML '08. pp. 336- 343 ,(2008) , 10.1145/1390156.1390199
Michael Bowling, Martha White, Learning a value analysis tool for agent evaluation international joint conference on artificial intelligence. pp. 1976- 1981 ,(2009)
Joel Veness, Marc Lanctot, Michael Bowling, Variance Reduction in Monte-Carlo Tree Search neural information processing systems. ,vol. 24, pp. 1836- 1844 ,(2011)
Yishay Mansour, Satinder P. Singh, Richard S Sutton, David A. McAllester, Policy Gradient Methods for Reinforcement Learning with Function Approximation neural information processing systems. ,vol. 12, pp. 1057- 1063 ,(1999)
Yuval Tassa, Daan Wierstra, Alexander Pritzel, Tom Erez, Jonathan J. Hunt, Nicolas Heess, David Silver, Timothy P. Lillicrap, Continuous control with deep reinforcement learning arXiv: Learning. ,(2015)
Harm Van Seijen, Richard S. Sutton, True online TD(λ) international conference on machine learning. ,(2014) , 10.13140/2.1.1456.2568
Philip S. Thomas, A Notation for Markov Decision Processes. arXiv: Artificial Intelligence. ,(2015)