作者: Eyal Amir , Deepak Ramachandran
DOI:
关键词: Probability distribution 、 Reward learning 、 Generalization error 、 Active learning (machine learning) 、 Machine learning 、 Temporal difference learning 、 Markov decision process 、 Unsupervised learning 、 Stability (learning theory) 、 Learning classifier system 、 Semi-supervised learning 、 Q-learning 、 Apprenticeship learning 、 Reinforcement learning 、 Preference elicitation 、 Instance-based learning 、 Preference learning 、 Artificial intelligence 、 Heuristic 、 Computer science
摘要: Inverse Reinforcement Learning (IRL) is the problem of learning reward function underlying a Markov Decision Process given dynamics system and behaviour an expert. IRL motivated by situations where knowledge rewards goal itself (as in preference elicitation) task apprenticeship (learning policies from expert). In this paper we show how to combine prior evidence expert's actions derive probability distribution over space functions. We present efficient algorithms that find solutions for tasks generalize well these distributions. Experimental results strong improvement our methods previous heuristic-based approaches.