作者: Pieter Abbeel
DOI:
关键词:
摘要: Many problems in robotics have unknown, stochastic, high-dimensional, and highly nonlinear dynamics, offer significant challenges to both traditional control methods reinforcement learning algorithms. Some of the key difficulties that arise these are: (i) It is often difficult write down, closed form, a formal specification task. For example, what objective function for "flying well"? (ii) build good dynamics model because data collection modeling (similar "exploration problem" learning). (iii) computationally expensive find closed-loop controllers high dimensional, stochastic domains. We describe algorithms with performance guarantees which show can be efficiently addressed apprenticeship setting—the setting when expert demonstrations task are available. Our guaranteed return policy comparable expert's. We evaluate on same (typically high-dimensional non-linear) environment as expert. Besides having theoretical guarantees, our also enabled us solve some previously unsolved real-world problems: They quadruped robot traverse challenging, unseen terrain. significantly extended state-of-the-art autonomous helicopter flight. has performed by far most challenging aerobatic maneuvers any date, including such continuous in-place flips, rolls tic-tocs, only exceptional human pilots fly. flight best pilots.