Safe Exploration in Continuous Action Spaces

作者: Krishnamurthy Dvijotham , Yuval Tassa , Cosmin Paduraru , Todd Hester , Gal Dalal

DOI:

关键词:

摘要: … An appropriate Lyapunov function was identified for policy at… 5 depicts the drawbacks of reward shaping for ensuring safety. … the best reward shaping choice, and to no reward shaping at …

参考文章(16)
Martin Enqvist, Linear models of nonlinear systems Seminar presented at the Dept. of Automatic Control at Lund University, Sweden, May 11, 2006. ,(2005)
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Guy Shani, David Heckerman, Ronen I Brafman, Craig Boutilier, An MDP-Based Recommender System Journal of Machine Learning Research. ,vol. 6, pp. 1265- 1295 ,(2005) , 10.5555/1046920.1088715
Peter L. Bartlett, Jonathan Baxter, Infinite-horizon policy-gradient estimation Journal of Artificial Intelligence Research. ,vol. 15, pp. 319- 350 ,(2001) , 10.1613/JAIR.806
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
Emanuel Todorov, Tom Erez, Yuval Tassa, MuJoCo: A physics engine for model-based control intelligent robots and systems. pp. 5026- 5033 ,(2012) , 10.1109/IROS.2012.6386109
Yuval Tassa, Daan Wierstra, Alexander Pritzel, Tom Erez, Jonathan J. Hunt, Nicolas Heess, David Silver, Timothy P. Lillicrap, Continuous control with deep reinforcement learning arXiv: Learning. ,(2015)