One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors

作者: Pieter Abbeel , Sergey Levine , Justin Fu

DOI:

关键词:

摘要: One of the key challenges in applying reinforcement learning to complex robotic control tasks is need gather large amounts experience order find an effective policy for task at hand. Model-based can achieve good sample efficiency, but requires ability learn a model dynamics that enough policy. In this work, we develop model-based algorithm combines prior knowledge from previous with online adaptation model. These two ingredients enable highly sample-efficient even regimes where estimating true very difficult, since allows method locally compensate unmodeled variation dynamics. We encode into neural network model, adapt it by progressively refitting local linear dynamics, and use predictive plan under these Our experimental results show approach be used solve variety manipulation just single attempt, using data other behaviors.

参考文章(23)
Emanuel Todorov, Weiwei Li, Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems. pp. 222- 229 ,(2004)
Ali Punjani, Pieter Abbeel, Deep learning helicopter dynamics models international conference on robotics and automation. pp. 3223- 3230 ,(2015) , 10.1109/ICRA.2015.7139643
Sascha Lange, Martin Riedmiller, Arne Voigtlander, Autonomous reinforcement learning on raw visual input data in a real world application international joint conference on neural network. pp. 1- 8 ,(2012) , 10.1109/IJCNN.2012.6252823
Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement learning in robotics: A survey The International Journal of Robotics Research. ,vol. 32, pp. 1238- 1274 ,(2013) , 10.1177/0278364913495721
Duy Nguyen-Tuong, Jan Peters, Using model knowledge for learning inverse dynamics international conference on robotics and automation. pp. 2677- 2682 ,(2010) , 10.1109/ROBOT.2010.5509858
Anil Aswani, Patrick Bouffard, Claire Tomlin, Extensions of learning-based model predictive control for real-time application to a quadrotor helicopter advances in computing and communications. pp. 4661- 4666 ,(2012) , 10.1109/ACC.2012.6315483
Yan Wu, Yiannis Demiris, Towards One Shot Learning by imitation for humanoid robots international conference on robotics and automation. pp. 2889- 2894 ,(2010) , 10.1109/ROBOT.2010.5509429
Gerhard Neumann, Marc Peter Deisenroth, Jan Peters, A Survey on Policy Search for Robotics ,(2013)
Joschka Boedecker, Jost Tobias Springenberg, Jan Wülfing, Martin Riedmiller, Approximate real-time optimal control based on sparse Gaussian process models ieee symposium on adaptive dynamic programming and reinforcement learning. pp. 1- 8 ,(2014) , 10.1109/ADPRL.2014.7010608