Safe Reinforcement Learning via Shielding

作者: Bettina Könighofer , Scott Niekum , Roderick Bloem , Ufuk Topcu , Ruediger Ehlers

DOI:

关键词:

摘要: … , we achieve safe reinforcement learning, which we … 1 Safe RL is the process of learning an optimal policy while satisfying a temporal logic safety specification ϕs during the learning …

参考文章(4)
L. G. Valiant, A theory of the learnable symposium on the theory of computing. ,vol. 27, pp. 1134- 1142 ,(1984) , 10.1145/800057.808710
Amir Pnueli, The temporal logic of programs 18th Annual Symposium on Foundations of Computer Science (sfcs 1977). pp. 46- 57 ,(1977) , 10.1109/SFCS.1977.32
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236