Bayesian inverse reinforcement learning

作者: Eyal Amir , Deepak Ramachandran

DOI:

关键词: Probability distributionReward learningGeneralization errorActive learning (machine learning)Machine learningTemporal difference learningMarkov decision processUnsupervised learningStability (learning theory)Learning classifier systemSemi-supervised learningQ-learningApprenticeship learningReinforcement learningPreference elicitationInstance-based learningPreference learningArtificial intelligenceHeuristicComputer science

摘要: Inverse Reinforcement Learning (IRL) is the problem of learning reward function underlying a Markov Decision Process given dynamics system and behaviour an expert. IRL motivated by situations where knowledge rewards goal itself (as in preference elicitation) task apprenticeship (learning policies from expert). In this paper we show how to combine prior evidence expert's actions derive probability distribution over space functions. We present efficient algorithms that find solutions for tasks generalize well these distributions. Experimental results strong improvement our methods previous heuristic-based approaches.

参考文章(15)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Craig Boutilier, Bob Price, A Bayesian approach to imitation in reinforcement learning international joint conference on artificial intelligence. pp. 712- 717 ,(2003)
Stefan Schaal, Christopher G. Atkeson, Robot Learning From Demonstration international conference on machine learning. pp. 12- 20 ,(1997)
Santosh Vempala, Geometric Random Walks: a Survey Combinatorial and Computational Geometry, 2007, ISBN 0-521-84862-8, págs. 577-616. pp. 577- 616 ,(2007)
Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430
Stuart Russell, Learning agents for uncertain environments (extended abstract) conference on learning theory. pp. 101- 103 ,(1998) , 10.1145/279943.279964
Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323
David Applegate, Ravi Kannan, Sampling and integration of near log-concave functions symposium on the theory of computing. pp. 156- 163 ,(1991) , 10.1145/103418.103439