Reinforcement and Imitation Learning for Diverse Visuomotor Skills

作者: Josh Merel , Tom Erez , Nando de Freitas , Yuke Zhu , Ziyu Wang

DOI:

关键词:

摘要: We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this …

参考文章(47)
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Sham Kakade, John Langford, Approximately Optimal Approximate Reinforcement Learning international conference on machine learning. pp. 267- 274 ,(2002)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
Gerhard Neumann, Marc Peter Deisenroth, Jan Peters, A Survey on Policy Search for Robotics ,(2013)
Abdeslam Boularias, Jens Kober, Jan R. Peters, Relative Entropy Inverse Reinforcement Learning international conference on artificial intelligence and statistics. pp. 182- 189 ,(2011)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
Pieter Abbeel, Chelsea Finn, Sergey Levine, Trevor Darrell, End-to-End Training of Deep Visuomotor Policies arXiv: Learning. ,(2015)
Emanuel Todorov, Tom Erez, Yuval Tassa, MuJoCo: A physics engine for model-based control intelligent robots and systems. pp. 5026- 5033 ,(2012) , 10.1109/IROS.2012.6386109
Lerrel Pinto, Abhinav Gupta, Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours international conference on robotics and automation. pp. 3406- 3413 ,(2016) , 10.1109/ICRA.2016.7487517
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis, None, Mastering the game of Go with deep neural networks and tree search Nature. ,vol. 529, pp. 484- 489 ,(2016) , 10.1038/NATURE16961