Apprenticeship learning and reinforcement learning with application to robotic control

作者: Pieter Abbeel

DOI:

关键词:

摘要: Many problems in robotics have unknown, stochastic, high-dimensional, and highly nonlinear dynamics, offer significant challenges to both traditional control methods reinforcement learning algorithms. Some of the key difficulties that arise these are: (i) It is often difficult write down, closed form, a formal specification task. For example, what objective function for "flying well"? (ii) build good dynamics model because data collection modeling (similar "exploration problem" learning). (iii) computationally expensive find closed-loop controllers high dimensional, stochastic domains. We describe algorithms with performance guarantees which show can be efficiently addressed apprenticeship setting—the setting when expert demonstrations task are available. Our guaranteed return policy comparable expert's. We evaluate on same (typically high-dimensional non-linear) environment as expert. Besides having theoretical guarantees, our also enabled us solve some previously unsolved real-world problems: They quadruped robot traverse challenging, unseen terrain. significantly extended state-of-the-art autonomous helicopter flight. has performed by far most challenging aerobatic maneuvers any date, including such continuous in-place flips, rolls tic-tocs, only exceptional human pilots fly. flight best pilots.

参考文章(77)
Kevin L. Moore, Iterative Learning Control: An Expository Overview Applied and Computational Control, Signals, and Circuits. pp. 151- 214 ,(1999) , 10.1007/978-1-4612-0571-5_4
Mark B. Tischler, Mavis G. Cauffman, Frequency-Response Method for Rotorcraft System Identification: Flight Applications to BO 105 Coupled Rotor/Fuselage Dynamics Journal of The American Helicopter Society. ,vol. 37, pp. 3- 17 ,(1992) , 10.4050/JAHS.37.3
Arthur Gelb, Applied Optimal Estimation ,(1974)
H. H. Rosenbrock, D. H. Jacobson, D. Q. Mayne, Differential Dynamic Programming The Mathematical Gazette. ,vol. 56, pp. 78- ,(1972) , 10.2307/3613752
Thomas D Gillispie, None, Fundamentals of Vehicle Dynamics ,(1992)
Jette Randløv, Preben Alstrøm, Learning to Drive a Bicycle Using Reinforcement Learning and Shaping international conference on machine learning. pp. 463- 471 ,(1998)
Claude Sammut, Scott Hurst, Dana Kedzier, Donald Michie, Learning to fly international conference on machine learning. pp. 385- 393 ,(1992) , 10.1016/B978-1-55860-247-2.50055-3
John B. Moore, Brian D. O. Anderson, Optimal Control: Linear Quadratic Methods ,(1979)
Lawrence K Saul, Michael I Jordan, None, Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones Machine Learning. ,vol. 37, pp. 75- 87 ,(1999) , 10.1023/A:1007649326333
Stefan Schaal, Christopher G. Atkeson, Robot Learning From Demonstration international conference on machine learning. pp. 12- 20 ,(1997)