Self-Imitation Learning

作者: Satinder Singh , Honglak Lee , Junhyuk Oh , Yijie Guo

DOI:

关键词: Artificial intelligenceImitation learningComputer scienceSimple (abstract algebra)

摘要: … any actor-critic architecture. (4) We demonstrate that SIL combined with advantage actor-critic (A2C… In this paper, we focus on the combination of advantage actor-critic (A2C) (Mnih et al.…

参考文章(31)
Brian D. Ziebart, J. Andrew Bagnell, Modeling purposeful adaptive behavior with the principle of maximum causal entropy Carnegie Mellon University. ,(2010) , 10.1184/R1/6720692.V1
Pieter Abbeel, Sergey Levine, Bradly C. Stadie, Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models arXiv: Artificial Intelligence. ,(2015)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)
Brian D. Ziebart, J. Andrew Bagnell, Anind K. Dey, Andrew Maas, Maximum entropy inverse reinforcement learning national conference on artificial intelligence. pp. 1433- 1438 ,(2008)
Peter Dayan, Máté Lengyel, Hippocampal Contributions to Control: The Third Way neural information processing systems. ,vol. 20, pp. 889- 896 ,(2007)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: an evaluation platform for general agents Journal of Artificial Intelligence Research. ,vol. 47, pp. 253- 279 ,(2013) , 10.1613/JAIR.3912
Yishay Mansour, Satinder P. Singh, Richard S Sutton, David A. McAllester, Policy Gradient Methods for Reinforcement Learning with Function Approximation neural information processing systems. ,vol. 12, pp. 1057- 1063 ,(1999)
John N. Tsitsiklis, Vijay R. Konda, Actor-critic algorithms ,(2002)