Classifying Options for Deep Reinforcement Learning

作者: Anil Anthony Bharath , Murray Shanahan , Kai Arulkumaran , Nat Dilokthanakul

DOI:

关键词:

摘要: In this paper we combine one method for hierarchical reinforcement learning - the options framework with deep Q-networks (DQNs) through use of different "option heads" on policy network, and a supervisory network choosing between options. We utilise our setup to investigate effects architectural constraints in subtasks positive negative transfer, across range capacities. empirically show that augmented DQN has lower sample complexity when simultaneously without degrading performance transfer.

参考文章(13)
Tom Schaul, Daniel Horgan, David Silver, Karol Gregor, Universal Value Function Approximators international conference on machine learning. pp. 1312- 1320 ,(2015)
Richard S. Sutton, Doina Precup, Satinder Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence. ,vol. 112, pp. 181- 211 ,(1999) , 10.1016/S0004-3702(99)00052-1
Gerhard Neumann, Oliver Kroemer, Jan Peters, Christian Daniel, Hierarchical Relative Entropy Policy Search ,(2014)
T. G. Dietterich, Hierarchical reinforcement learning with the MAXQ value function decomposition Journal of Artificial Intelligence Research. ,vol. 13, pp. 227- 303 ,(2000) , 10.1613/JAIR.639
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: an evaluation platform for general agents Journal of Artificial Intelligence Research. ,vol. 47, pp. 253- 279 ,(2013) , 10.1613/JAIR.3912
Pieter Abbeel, Chelsea Finn, Sergey Levine, Trevor Darrell, End-to-End Training of Deep Visuomotor Policies arXiv: Learning. ,(2015)
Ian Osband, Alexander Pritzel, Benjamin Van Roy, Charles Blundell, Deep Exploration via Bootstrapped DQN arXiv: Learning. ,(2016)
Alex M. Andrew, Reinforcement Learning: : An Introduction Kybernetes. ,vol. 27, pp. 1093- 1096 ,(1998) , 10.1108/K.1998.27.9.1093.3