作者: Gaurav S. Sukhatme , Youngwoon Lee , Joseph J. Lim , Max Pflueger , Peter Englert
DOI:
关键词: Human–computer interaction 、 Motion (physics) 、 Code (cryptography) 、 Computer science 、 SIGNAL (programming language) 、 Planner 、 Reinforcement learning 、 Robot 、 Action (philosophy)
摘要: Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. In contrast, motion planners use explicit models the agent and environment plan collision-free paths faraway goals, suffer from inaccurate contacts environment. To combine benefits both approaches, we propose planner augmented RL (MoPA-RL) which augments action space an long-horizon planning capabilities planners. Based on magnitude action, our approach smoothly transitions between directly executing invoking planner. We evaluate various simulated compare it alternative spaces terms efficiency safety. The experiments demonstrate MoPA-RL increases efficiency, leads faster exploration, results safer policies avoid collisions Videos code available at this https URL .