作者: Claire Tomlin , Anca D. Dragan , Andreea Bobu , Marius Wiggert
关键词:
摘要: When a person is not satisfied with how robot performs task, they can intervene to correct it. Reward learning methods enable the adapt its reward function online based on such human input, but rely handcrafted features. correction cannot be explained by these features, recent work in deep Inverse Reinforcement Learning (IRL) suggests that could ask for task demonstrations and recover defined over raw state space. Our insight rather than implicitly about missing feature(s) from demonstrations, should instead data explicitly teaches it what missing. We introduce new type of input which guides states where feature being taught highly expressed not. propose an algorithm space integrating into function. By focusing feature, our method decreases sample complexity improves generalization learned above IRL baseline. show this experiments physical 7DOF manipulator, as well user study conducted simulated environment.