搜索历史记录选项已关闭,请开启搜索历史记录选项。
作者: Wonjoon Goo , Scott Niekum
DOI:
关键词:
摘要: We introduce an offline reinforcement learning (RL) algorithm that explicitly clones a behavior policy to constrain value learning. In offline RL, it is often important to prevent a …