Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

作者: Scott Niekum , Daniel S. Brown , Wonjoon Goo , Prabhat Nagarajan

DOI:

关键词: Benchmark (computing)Machine learningRankingTask (project management)Noise (video)Artificial intelligenceFunction (engineering)Reinforcement learningComputer scienceSet (psychology)

摘要: A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward …

参考文章(49)
Riad Akrour, Marc Schoenauer, Michele Sebag, Preference-based policy learning european conference on machine learning. ,vol. 6911, pp. 12- 27 ,(2011) , 10.1007/978-3-642-23780-5_11
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Eyal Amir, Deepak Ramachandran, Bayesian inverse reinforcement learning international joint conference on artificial intelligence. ,vol. 51, pp. 2586- 2591 ,(2007)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
Ingmar Posner, Markus Wulfmeier, Peter Ondruska, Maximum Entropy Deep Inverse Reinforcement Learning arXiv: Learning. ,(2015)
Brenna D. Argall, Sonia Chernova, Manuela Veloso, Brett Browning, A survey of robot learning from demonstration Robotics and Autonomous Systems. ,vol. 57, pp. 469- 483 ,(2009) , 10.1016/J.ROBOT.2008.10.024
Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430
Dean A. Pomerleau, Efficient training of artificial neural networks for autonomous navigation Neural Computation. ,vol. 3, pp. 88- 97 ,(1991) , 10.1162/NECO.1991.3.1.88
Shao Zhifei, Er Meng Joo, A survey of inverse reinforcement learning techniques International Journal of Intelligent Computing and Cybernetics. ,vol. 5, pp. 293- 311 ,(2012) , 10.1108/17563781211255862