Apprenticeship learning via inverse reinforcement learning

作者: Pieter Abbeel , Andrew Y. Ng

DOI: 10.1145/1015330.1015430

关键词:

摘要: We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an …

参考文章(14)
Claude Sammut, Scott Hurst, Dana Kedzier, Donald Michie, Learning to fly international conference on machine learning. pp. 385- 393 ,(1992) , 10.1016/B978-1-55860-247-2.50055-3
Stefan Schaal, Christopher G. Atkeson, Robot Learning From Demonstration international conference on machine learning. pp. 12- 20 ,(1997)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
N Hogan, An organizing principle for a class of voluntary movements The Journal of Neuroscience. ,vol. 4, pp. 2745- 2754 ,(1984) , 10.1523/JNEUROSCI.04-11-02745.1984
Alan S. Manne, Linear Programming and Sequential Decisions Management Science. ,vol. 6, pp. 259- 267 ,(1960) , 10.1287/MNSC.6.3.259
Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323
Y. Uno, M. Kawato, R. Suzuki, Formation and control of optimal trajectory in human multijoint arm movement Biological Cybernetics. ,vol. 61, pp. 89- 101 ,(1989) , 10.1007/BF00204593
R. Amit, M. Matari, Learning movement sequences from demonstration international conference on development and learning. pp. 203- 208 ,(2002) , 10.1109/DEVLRN.2002.1011867
Y. Kuniyoshi, M. Inaba, H. Inoue, Learning by watching: extracting reusable task knowledge from visual observation of human performance international conference on robotics and automation. ,vol. 10, pp. 799- 822 ,(1994) , 10.1109/70.338535
Vladimir Naumovich Vapnik, Vlamimir Vapnik, Statistical learning theory John Wiley & Sons. ,(1998)