作者: Scott Niekum , Peter Stone , Philip S. Thomas , Josiah P. Hanna
DOI:
关键词: Mean squared error 、 Markov decision process 、 Task (project management) 、 Search algorithm 、 Computer science 、 Econometrics 、 Expression (mathematics)
摘要: We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its …