Reward (Mis)design for Autonomous Driving.

作者: Alessandro Allievi , Peter Stone , W. Bradley Knox , Holger Banzhaf , Felix Schmitt

DOI:

关键词:

摘要: This paper considers the problem of reward design for autonomous driving (AD), with insights that are also applicable to cost functions and performance metrics more generally. Herein we develop 8 simple sanity checks identifying flaws in functions. The applied from past work on reinforcement learning (RL) driving, revealing near-universal AD might exist pervasively across other tasks. Lastly, explore promising directions may help future researchers AD.

参考文章(3)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
Ingmar Posner, Markus Wulfmeier, Peter Ondruska, Maximum Entropy Deep Inverse Reinforcement Learning arXiv: Learning. ,(2015)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)