Interactive Reinforcement Learning with Inaccurate Feedback

作者: Taylor A. Kessler Faulkner , Elaine Schaertl Short , Andrea L. Thomaz

DOI: 10.1109/ICRA40945.2020.9197219

关键词:

摘要: Interactive Reinforcement Learning (RL) enables agents to learn from two sources: rewards taken observations of the environment, and feedback or advice a secondary critic source, such as human teachers sensor feedback. The addition information during learning process allows more quickly than non-interactive RL. There are many methods that allow policy be combined with However, critics can often give imperfect information. In this work, we introduce framework for characterizing RL propose an algorithm, Revision Estimation Partially Incorrect Resources (REPaIR), which estimate corrections over time. We run experiments both in simulations demonstrate performance on physical robot, find when baseline algorithms do not have prior exact quality using REPaIR matches improves expected these algorithms.

参考文章(1)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)