Imitation with Neural Density Models

作者: Stefano Ermon , Yanan Sui , Jiaming Song , Yang Song , Kuno Kim

DOI:

关键词:

摘要: We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning …

参考文章(53)
Aapo Hyvärinen, Estimation of Non-Normalized Statistical Models by Score Matching Journal of Machine Learning Research. ,vol. 6, pp. 695- 709 ,(2005)
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Yoshua Bengio, Laurent Dinh, David Krueger, NICE: Non-linear Independent Components Estimation arXiv: Learning. ,(2014)
Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew Bagnell, Martial Hebert, Learning monocular reactive UAV control in cluttered natural environments international conference on robotics and automation. pp. 1765- 1772 ,(2013) , 10.1109/ICRA.2013.6630809
Emanuel Todorov, Convex and analytically-invertible dynamics with contacts and constraints: Theory and implementation in MuJoCo international conference on robotics and automation. pp. 6054- 6061 ,(2014) , 10.1109/ICRA.2014.6907751
Dean A. Pomerleau, Efficient training of artificial neural networks for autonomous navigation Neural Computation. ,vol. 3, pp. 88- 97 ,(1991) , 10.1162/NECO.1991.3.1.88
, Generative Adversarial Nets neural information processing systems. ,vol. 27, pp. 2672- 2680 ,(2014) , 10.3156/JSOFT.29.5_177_2
Benigno Uria, Hugo Larochelle, Iain Murray, RNADE: The real-valued neural autoregressive density-estimator neural information processing systems. ,vol. 26, pp. 2175- 2183 ,(2013)
Drew Bagnell, Stéphane Ross, Efficient Reductions for Imitation Learning international conference on artificial intelligence and statistics. pp. 661- 668 ,(2010)
Satinder Singh, Honglak Lee, Xiaoshi Wang, Richard L Lewis, Xiaoxiao Guo, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning neural information processing systems. ,vol. 27, pp. 3338- 3346 ,(2014)