Active imitation learning

作者: Aaron P. Shon , Deepak Verma , Rajesh P. N. Rao

DOI:

关键词:

摘要: Imitation learning, also called learning by watching or programming demonstration, has emerged as a means of accelerating many reinforcement tasks. Previous work shown the value imitation in domains where single mentor demonstrates execution known optimal policy for benefit agent. We consider more general scenario from mentors who are themselves agents seeking to maximize their own rewards. propose new algorithm based on concept transferable utility ensuring that an observer agent can learn efficiently context selfish, not necessarily helpful, mentor. address questions when imitative should request help mentor, and be expected acknowledge help. In analogy with other types active we call proposed approach learning.

参考文章(1)
Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430