作者: Chelsea Finn , Ken Goldberg , Joseph E. Gonzalez , Julian Ibarz , Minho Hwang
DOI:
关键词: Task (project management) 、 Constrained optimization 、 Machine learning 、 Computer science 、 Robot 、 Artificial intelligence 、 Obstacle 、 Obstacle avoidance 、 Constraint satisfaction 、 Constraint (information theory) 、 Reinforcement learning
摘要: Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks uncertain environments requires extensive exploration, but safety limiting exploration. We propose Recovery RL, an algorithm which navigates this tradeoff by (1) leveraging offline data to learn about constraint violating zones before policy and (2) separating goals improving task performance satisfaction across two policies: that only optimizes reward recovery guides agent when violation is likely. evaluate on 6 simulation domains, including contact-rich manipulation image-based navigation task, avoidance physical robot. compare 5 prior safe methods jointly optimize for via constrained optimization or shaping find outperforms next best method all domains. Results suggest trades off violations successes 2 - 80 times more efficiently domains 3 experiments. See https URL videos supplementary material.