Self-Supervised Online Reward Shaping in Sparse-Reward Environments.

作者: Rudolf Lioutikov , Scott Niekum , Ufuk Topcu , Wonjoon Goo , Farzan Memarian

DOI:

关键词: Computer scienceFunction (mathematics)InferenceMachine learningSample (statistics)SIGNAL (programming language)Reinforcement learningClassifier (linguistics)Artificial intelligence

摘要: … reward inference and policy update steps—the original sparse reward provides a selfsupervisory signal for reward … newly inferred, typically dense reward function. We introduce theory …

参考文章(33)
Jonathan Daniel Sorg, Satinder S. Baveja, The optimal reward problem: designing effective reward for bounded agents University of Michigan. ,(2011)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement learning in robotics: A survey The International Journal of Robotics Research. ,vol. 32, pp. 1238- 1274 ,(2013) , 10.1177/0278364913495721
Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430
Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236