Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments

作者: Gaurav S. Sukhatme , Youngwoon Lee , Joseph J. Lim , Max Pflueger , Peter Englert

DOI:

关键词: Human–computer interactionMotion (physics)Code (cryptography)Computer scienceSIGNAL (programming language)PlannerReinforcement learningRobotAction (philosophy)

摘要: Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. In contrast, motion planners use explicit models the agent and environment plan collision-free paths faraway goals, suffer from inaccurate contacts environment. To combine benefits both approaches, we propose planner augmented RL (MoPA-RL) which augments action space an long-horizon planning capabilities planners. Based on magnitude action, our approach smoothly transitions between directly executing invoking planner. We evaluate various simulated compare it alternative spaces terms efficiency safety. The experiments demonstrate MoPA-RL increases efficiency, leads faster exploration, results safer policies avoid collisions Videos code available at this https URL .

参考文章(36)
S. Lavalle, Rapidly-exploring random trees : a new tool for path planning The annual research report. ,(1998)
M.H. Overmars, A random approach to motion planning Unknown Publisher. ,(1992)
Rok Vuga, Bojan Nemec, Aleš Ude, Enhanced Policy Adaptation Through Directed Explorative Learning International Journal of Humanoid Robotics. ,vol. 12, pp. 1550028- ,(2015) , 10.1142/S0219843615500280
Sertac Karaman, Emilio Frazzoli, Sampling-based algorithms for optimal motion planning The International Journal of Robotics Research. ,vol. 30, pp. 846- 894 ,(2011) , 10.1177/0278364911406761
Mohamed Elbanhawi, Milan Simic, Sampling-Based Robot Motion Planning: A Review IEEE Access. ,vol. 2, pp. 56- 77 ,(2014) , 10.1109/ACCESS.2014.2302442
Vladlen Koltun, Sergey Levine, Guided Policy Search international conference on machine learning. pp. 1- 9 ,(2013)
Richard S. Sutton, Doina Precup, Satinder Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence. ,vol. 112, pp. 181- 211 ,(1999) , 10.1016/S0004-3702(99)00052-1
Ioan A Sucan, Mark Moll, Lydia E Kavraki, None, The Open Motion Planning Library IEEE Robotics & Automation Magazine. ,vol. 19, pp. 72- 82 ,(2012) , 10.1109/MRA.2012.2205651
L. Kavraki, J.-C. Latombe, Randomized preprocessing of configuration for fast path planning international conference on robotics and automation. pp. 2138- 2145 ,(1994) , 10.1109/ROBOT.1994.350966
Roderic Grupen, Christopher Connolly, Andrew G. Barto, Satinder P. Singh, Robust Reinforcement Learning in Motion Planning neural information processing systems. ,vol. 6, pp. 655- 662 ,(1993)