Self-Imitation Learning

作者： Satinder Singh , Honglak Lee , Junhyuk Oh , Yijie Guo

DOI:

关键词: Artificial intelligence 、 Imitation learning 、 Computer science 、 Simple (abstract algebra)

摘要: … any actor-critic architecture. (4) We demonstrate that SIL combined with advantage actor-critic (A2C… In this paper, we focus on the combination of advantage actor-critic (A2C) (Mnih et al.…

参考文章(31)

Brian D. Ziebart, J. Andrew Bagnell, Modeling purposeful adaptive behavior with the principle of maximum causal entropy Carnegie Mellon University. ,(2010) , 10.1184/R1/6720692.V1

Pieter Abbeel, Sergey Levine, Bradly C. Stadie, Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models arXiv: Artificial Intelligence. ,(2015)

Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)

Brian D. Ziebart, J. Andrew Bagnell, Anind K. Dey, Andrew Maas, Maximum entropy inverse reinforcement learning national conference on artificial intelligence. pp. 1433- 1438 ,(2008)

Peter Dayan, Máté Lengyel, Hippocampal Contributions to Control: The Third Way neural information processing systems. ,vol. 20, pp. 889- 896 ,(2007)

Long-Ji Lin, Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching Machine Learning. ,vol. 8, pp. 293- 321 ,(1992) , 10.1007/BF00992699

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236

M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: an evaluation platform for general agents Journal of Artificial Intelligence Research. ,vol. 47, pp. 253- 279 ,(2013) , 10.1613/JAIR.3912

Yishay Mansour, Satinder P. Singh, Richard S Sutton, David A. McAllester, Policy Gradient Methods for Reinforcement Learning with Function Approximation neural information processing systems. ,vol. 12, pp. 1057- 1063 ,(1999)

10.

John N. Tsitsiklis, Vijay R. Konda, Actor-critic algorithms ,(2002)

Self-Imitation Learning

来源期刊

我的账户

Self-Imitation Learning

来源期刊

相似文章 10

我的账户