Data-Efficient Policy Evaluation Through Behavior Policy Search

作者： Scott Niekum , Peter Stone , Philip S. Thomas , Josiah P. Hanna

DOI:

关键词: Mean squared error 、 Markov decision process 、 Task (project management) 、 Search algorithm 、 Computer science 、 Econometrics 、 Expression (mathematics)

摘要: We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its …

参考文章(17)

Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)

Julio da Motta Singer, Pranab Kumar Sen, Large Sample Methods in Statistics: An Introduction With Applications ,(1993)

Dimitri P. Bertsekas, John N. Tsitsiklis, Gradient Convergence in Gradient methods with Errors Siam Journal on Optimization. ,vol. 10, pp. 627- 642 ,(1999) , 10.1137/S1052623497331063

Jordan Frank, Shie Mannor, Doina Precup, Reinforcement learning in the presence of rare events Proceedings of the 25th international conference on Machine learning - ICML '08. pp. 336- 343 ,(2008) , 10.1145/1390156.1390199

Michael Bowling, Martha White, Learning a value analysis tool for agent evaluation international joint conference on artificial intelligence. pp. 1976- 1981 ,(2009)

Joel Veness, Marc Lanctot, Michael Bowling, Variance Reduction in Monte-Carlo Tree Search neural information processing systems. ,vol. 24, pp. 1836- 1844 ,(2011)

Yishay Mansour, Satinder P. Singh, Richard S Sutton, David A. McAllester, Policy Gradient Methods for Reinforcement Learning with Function Approximation neural information processing systems. ,vol. 12, pp. 1057- 1063 ,(1999)

Yuval Tassa, Daan Wierstra, Alexander Pritzel, Tom Erez, Jonathan J. Hunt, Nicolas Heess, David Silver, Timothy P. Lillicrap, Continuous control with deep reinforcement learning arXiv: Learning. ,(2015)

Harm Van Seijen, Richard S. Sutton, True online TD(λ) international conference on machine learning. ,(2014) , 10.13140/2.1.1456.2568

10.

Philip S. Thomas, A Notation for Markov Decision Processes. arXiv: Artificial Intelligence. ,(2015)

Data-Efficient Policy Evaluation Through Behavior Policy Search

来源期刊

我的账户

Data-Efficient Policy Evaluation Through Behavior Policy Search

来源期刊

相似文章 0

我的账户