Self-Supervised Online Reward Shaping in Sparse-Reward Environments.

作者： Rudolf Lioutikov , Scott Niekum , Ufuk Topcu , Wonjoon Goo , Farzan Memarian

DOI:

关键词: Computer science 、 Function (mathematics) 、 Inference 、 Machine learning 、 Sample (statistics) 、 SIGNAL (programming language) 、 Reinforcement learning 、 Classifier (linguistics) 、 Artificial intelligence

摘要: … reward inference and policy update steps—the original sparse reward provides a selfsupervisory signal for reward … newly inferred, typically dense reward function. We introduce theory …

参考文章(33)

Jonathan Daniel Sorg, Satinder S. Baveja, The optimal reward problem: designing effective reward for bounded agents University of Michigan. ,(2011)

Arpad E. Elo, The rating of chessplayers, past and present ,(1978)

Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)

Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement learning in robotics: A survey The International Journal of Robotics Research. ,vol. 32, pp. 1238- 1274 ,(2013) , 10.1177/0278364913495721

Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430

Roger N. Shepard, Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space Psychometrika. ,vol. 22, pp. 325- 345 ,(1957) , 10.1007/BF02288967

Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323

R. Duncan Luce, Individual Choice Behavior: A Theoretical Analysis ,(1979)

A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)

10.

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236

Self-Supervised Online Reward Shaping in Sparse-Reward Environments.

来源期刊

我的账户

Self-Supervised Online Reward Shaping in Sparse-Reward Environments.

来源期刊

相似文章 0

我的账户