作者: Bernard Michini , Jonathan P. How
DOI: 10.1007/978-3-642-33486-3_10
关键词:
摘要: Inverse reinforcement learning (IRL) is the task of reward function a Markov Decision Process (MDP) given transition and set observed demonstrations in form state-action pairs. Current IRL algorithms attempt to find single which explains entire observation set. In practice, this leads computationally-costly search over large (typically infinite) space complex functions. This paper proposes notion that if observations can be partitioned into smaller groups, class much simpler functions used explain each group. The proposed method uses Bayesian nonparametric mixture model automatically partition data simple corresponding partition. rewards are interpreted intuitively as subgoals, predict actions or analyze states important demonstrator. Experimental results for examples showing comparable performance other nominal situations. Moreover, handles cyclic tasks (where agent begins ends same state) would break existing without modification. Finally, new algorithm has fundamentally different structure than previous methods, making it more computationally efficient real-world scenario where state but demonstration small.