Bayesian Nonparametric Inverse Reinforcement Learning

作者: Bernard Michini , Jonathan P. How

DOI: 10.1007/978-3-642-33486-3_10

关键词:

摘要: Inverse reinforcement learning (IRL) is the task of reward function a Markov Decision Process (MDP) given transition and set observed demonstrations in form state-action pairs. Current IRL algorithms attempt to find single which explains entire observation set. In practice, this leads computationally-costly search over large (typically infinite) space complex functions. This paper proposes notion that if observations can be partitioned into smaller groups, class much simpler functions used explain each group. The proposed method uses Bayesian nonparametric mixture model automatically partition data simple corresponding partition. rewards are interpreted intuitively as subgoals, predict actions or analyze states important demonstrator. Experimental results for examples showing comparable performance other nominal situations. Moreover, handles cyclic tasks (where agent begins ends same state) would break existing without modification. Finally, new algorithm has fundamentally different structure than previous methods, making it more computationally efficient real-world scenario where state but demonstration small.

参考文章(7)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)
Eyal Amir, Deepak Ramachandran, Bayesian inverse reinforcement learning international joint conference on artificial intelligence. ,vol. 51, pp. 2586- 2591 ,(2007)
Brenna D. Argall, Sonia Chernova, Manuela Veloso, Brett Browning, A survey of robot learning from demonstration Robotics and Autonomous Systems. ,vol. 57, pp. 469- 483 ,(2009) , 10.1016/J.ROBOT.2008.10.024
Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430
Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323
Richard S. Sutton, Doina Precup, Satinder Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence. ,vol. 112, pp. 181- 211 ,(1999) , 10.1016/S0004-3702(99)00052-1