作者: Taylor A. Kessler Faulkner , Elaine Schaertl Short , Andrea L. Thomaz
DOI: 10.1109/ICRA40945.2020.9197219
关键词:
摘要: Interactive Reinforcement Learning (RL) enables agents to learn from two sources: rewards taken observations of the environment, and feedback or advice a secondary critic source, such as human teachers sensor feedback. The addition information during learning process allows more quickly than non-interactive RL. There are many methods that allow policy be combined with However, critics can often give imperfect information. In this work, we introduce framework for characterizing RL propose an algorithm, Revision Estimation Partially Incorrect Resources (REPaIR), which estimate corrections over time. We run experiments both in simulations demonstrate performance on physical robot, find when baseline algorithms do not have prior exact quality using REPaIR matches improves expected these algorithms.