Policy gradient bayesian robust optimization for imitation learning

作者: Zaynah Javed , Daniel S Brown , Satvik Sharma , Jerry Zhu , Ashwin Balakrishna

DOI:

关键词:

摘要: The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there …

参考文章(0)