Aux-airl: End-to-end self-supervised reward learning for extrapolating beyond suboptimal demonstrations

作者： Yuchen Cui , Bo Liu , Akanksha Saran , Stephen Giguere , Peter Stone

DOI:

关键词:

摘要: Real-world human demonstrations are often suboptimal. How to extrapolate beyond suboptimal demonstration is an important open research question. In this ongoing work, we analyze the success of a previous state-of-the-art self-supervised reward learning method that requires four sequential optimization steps, and propose a simple end-toend imitation learning method Aux-ARIL that extrapolates from suboptimal demonstrations without requiring multiple optimization steps.

utexas.edu PDF 下载加速

参考文章(0)

Aux-airl: End-to-end self-supervised reward learning for extrapolating beyond suboptimal demonstrations

来源期刊

我的账户

Aux-airl: End-to-end self-supervised reward learning for extrapolating beyond suboptimal demonstrations

来源期刊

相似文章 0

我的账户