Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones

作者: Chelsea Finn , Ken Goldberg , Joseph E. Gonzalez , Julian Ibarz , Minho Hwang

DOI:

关键词: Task (project management)Constrained optimizationMachine learningComputer scienceRobotArtificial intelligenceObstacleObstacle avoidanceConstraint satisfactionConstraint (information theory)Reinforcement learning

摘要: Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks uncertain environments requires extensive exploration, but safety limiting exploration. We propose Recovery RL, an algorithm which navigates this tradeoff by (1) leveraging offline data to learn about constraint violating zones before policy and (2) separating goals improving task performance satisfaction across two policies: that only optimizes reward recovery guides agent when violation is likely. evaluate on 6 simulation domains, including contact-rich manipulation image-based navigation task, avoidance physical robot. compare 5 prior safe methods jointly optimize for via constrained optimization or shaping find outperforms next best method all domains. Results suggest trades off violations successes 2 - 80 times more efficiently domains 3 experiments. See https URL videos supplementary material.

参考文章(35)
Matthias Heger, Consideration of Risk in Reinforcement Learning Machine Learning Proceedings 1994. pp. 105- 111 ,(1994) , 10.1016/B978-1-55860-335-6.50021-0
Jeremy H. Gillula, Claire J. Tomlin, Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor international conference on robotics and automation. pp. 2723- 2730 ,(2012) , 10.1109/ICRA.2012.6225136
Peter Kazanzides, Zihan Chen, Anton Deguet, Gregory S. Fischer, Russell H. Taylor, Simon P. DiMaio, An open-source research kit for the da Vinci® Surgical System international conference on robotics and automation. pp. 6434- 6439 ,(2014) , 10.1109/ICRA.2014.6907809
P. Geibel, F. Wysotzki, Risk-sensitive reinforcement learning applied to control under constraints Journal of Artificial Intelligence Research. ,vol. 24, pp. 81- 108 ,(2005) , 10.1613/JAIR.1666
Michael J. Tobia, Tobias Sommer, Klaus Obermayer, Yun Shen, Risk-sensitive reinforcement learning Neural Computation. ,vol. 26, pp. 1298- 1328 ,(2014) , 10.1162/NECO_A_00600
Weiqiao Han, Sergey Levine, Pieter Abbeel, Learning compound multi-step controllers under unknown dynamics intelligent robots and systems. pp. 6435- 6442 ,(2015) , 10.1109/IROS.2015.7354297
Aviv Tamar, Shie Mannor, Yonatan Glassner, Policy Gradients Beyond Expectations: Conditional Value-at-Risk. ,(2014)
Francesco Borrelli, Ugo Rosolia, Learning Model Predictive Control for Iterative Tasks. arXiv: Systems and Control. ,(2016)
Felix Berkenkamp, Andreas Krause, Matteo Turchetta, Angela P. Schoellig, Safe Model-based Reinforcement Learning with Stability Guarantees neural information processing systems. ,vol. 30, pp. 908- 918 ,(2017)