Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

作者: Alexander G. Schwing , Jian Peng , Jian Peng , Frank S. He , Frank S. He

DOI:

关键词:

摘要: We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with constrained optimization approach to tighten optimality and encourage faster reward propagation. Our technique makes more practical by drastically reducing time. evaluate performance our on 49 games challenging Arcade Learning Environment, report significant improvements in both time accuracy.

参考文章(23)
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)
Arun Nair, Charles Beattie, Alessandro De Maria, Rory Fearon, Cagdas Alcicek, Vedavyas Panneershelvam, David Silver, Stig Petersen, Mustafa Suleyman, Sam Blackwell, Praveen Srinivasan, Volodymyr Mnih, Koray Kavukcuoglu, Shane Legg, Massively Parallel Methods for Deep Reinforcement Learning arXiv: Learning. ,(2015)
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
L. P. Kaelbling, M. L. Littman, A. W. Moore, Reinforcement learning: a survey Journal of Artificial Intelligence Research. ,vol. 4, pp. 237- 285 ,(1996) , 10.1613/JAIR.301
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
J.N. Tsitsiklis, B. Van Roy, An analysis of temporal-difference learning with function approximation IEEE Transactions on Automatic Control. ,vol. 42, pp. 674- 690 ,(1997) , 10.1109/9.580874
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: an evaluation platform for general agents Journal of Artificial Intelligence Research. ,vol. 47, pp. 253- 279 ,(2013) , 10.1613/JAIR.3912