Aux-airl: End-to-end self-supervised reward learning for extrapolating beyond suboptimal demonstrations

作者: Yuchen Cui , Bo Liu , Akanksha Saran , Stephen Giguere , Peter Stone

DOI:

关键词:

摘要: Real-world human demonstrations are often suboptimal. How to extrapolate beyond suboptimal demonstration is an important open research question. In this ongoing work, we analyze the success of a previous state-of-the-art self-supervised reward learning method that requires four sequential optimization steps, and propose a simple end-toend imitation learning method Aux-ARIL that extrapolates from suboptimal demonstrations without requiring multiple optimization steps.

参考文章(0)