Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning.

作者: Matthew E. Taylor , Gabriel Victor de la Cruz , Yunshu Du

DOI:

关键词:

摘要: Deep Reinforcement Learning (DRL) algorithms are known to be data inefficient. One reason is that a DRL agent learns both the feature and the policy tabula rasa. Integrating …

参考文章(34)
Bilal Piot, Matthieu Geist, Olivier Pietquin, Boosted bellman residual minimization handling expert Demonstrations european conference on machine learning. ,vol. 8725, pp. 549- 564 ,(2014) , 10.1007/978-3-662-44851-9_35
Long-Ji Lin, Reinforcement learning for robots using neural networks Carnegie Mellon University. ,(1992)
Brenna D. Argall, Sonia Chernova, Manuela Veloso, Brett Browning, A survey of robot learning from demonstration Robotics and Autonomous Systems. ,vol. 57, pp. 469- 483 ,(2009) , 10.1016/J.ROBOT.2008.10.024
Peter Stone, Matthew E. Taylor, Transfer Learning for Reinforcement Learning Domains: A Survey Journal of Machine Learning Research. ,vol. 10, pp. 1633- 1685 ,(2009) , 10.5555/1577069.1755839
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge International Journal of Computer Vision. ,vol. 115, pp. 211- 252 ,(2015) , 10.1007/S11263-015-0816-Y
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Doina Precup, Joelle Pineau, Amir massoud Farahmand, Beomjoon Kim, Learning from Limited Demonstrations neural information processing systems. ,vol. 26, pp. 2859- 2867 ,(2013)
Yoshua Bengio, Pierre-Antoine Manzagol, Samy Bengio, Dumitru Erhan, Aaron Courville, Pascal Vincent, Why Does Unsupervised Pre-training Help Deep Learning? Journal of Machine Learning Research. ,vol. 11, pp. 625- 660 ,(2010)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
Stefan Schaal, Learning from Demonstration neural information processing systems. ,vol. 9, pp. 1040- 1046 ,(1996)