Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning

作者: Changxi You , Jianbo Lu , Dimitar Filev , Panagiotis Tsiotras

DOI: 10.1016/J.ROBOT.2019.01.003

关键词:

摘要: Abstract Autonomous vehicles promise to improve traffic safety while, at the same time, increase fuel efficiency and reduce congestion. They represent main trend in future intelligent transportation systems. This paper concentrates on planning problem of autonomous traffic. We model interaction between vehicle environment as a stochastic Markov decision process (MDP) consider driving style an expert driver target be learned. The road geometry is taken into consideration MDP order incorporate more diverse styles. desired, expert-like behavior obtained follows: First, we design reward function corresponding determine optimal strategy for using reinforcement learning techniques. Second, collect number demonstrations from learn based data inverse learning. unknown approximated deep neural-network (DNN). clarify validate application maximum entropy principle (MEP) DNN function, provide necessary derivations parameterized feature (reward) function. Simulated results demonstrate desired behaviors both

参考文章(67)
Christopher J. C. H. Watkins, Peter Dayan, Technical Note : \cal Q -Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1007/BF00992698
Thore Graepel, Aaron Defazio, A Comparison of learning algorithms on the Arcade Learning Environment arXiv: Learning. ,(2014)
Gabriel Hugh Elkaim, Ji-wung Choi, Renwick E. Curry, Contin uous Curvature Path Generation Based on Bezier Curves for Autonomous Vehicles ,(2010)
Michael L. Littman, Friend-or-Foe Q-learning in General-Sum Games international conference on machine learning. pp. 322- 328 ,(2001)
Michael L. Littman, Markov games as a framework for multi-agent reinforcement learning Machine Learning Proceedings 1994. pp. 157- 163 ,(1994) , 10.1016/B978-1-55860-335-6.50027-1
Miroslav Dudík, Robert E. Schapire, Maximum Entropy Distribution Estimation with Generalized Regularization Learning Theory. pp. 123- 138 ,(2006) , 10.1007/11776420_12
Christopher J.C.H. Watkins, Peter Dayan, Technical Note Q-Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1023/A:1022676722315
Trevor Hastie, Robert Tibshirani, Jerome Friedman, Trevor Hastie, Robert Tibshirani, Jerome Friedman, Overview of Supervised Learning Springer, New York, NY. pp. 9- 40 ,(2001) , 10.1007/978-0-387-21606-5_2