作者: Elaine Schaertl Short , Isaac S. Sheidlower
关键词:
摘要: When a robot is deployed to learn new task in "real-word" environment, there may be multiple teachers and therefore sources of feedback. Furthermore, optimal solutions for given have preferences among those various solutions. We present an Interactive Reinforcement Learning (I-RL) algorithm, Multi-Teacher Activated Policy Shaping (M-TAPS), which addresses the problem learning from leverages differences between them as means explore environment. show that this algorithm can significantly increase agent's robustness environment quickly adopt teacher's preferences. Finally, we formal model comparing human constructed oracle way they provide feedback robot.