Importance sampling in reinforcement learning with an estimated behavior policy

作者: Scott Niekum , Peter Stone , Josiah P. Hanna

DOI: 10.1007/S10994-020-05938-9

关键词: Reinforcement learningMachine learningImportance samplingArtificial intelligenceComputer science

摘要:

参考文章(5)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou, Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning arXiv: Learning. ,(2020)