作者: Zaynah Javed , Daniel S Brown , Satvik Sharma , Jerry Zhu , Ashwin Balakrishna
DOI:
关键词:
摘要: The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there …