Know your boundaries: The necessity of explicit behavioral cloning in offline rl

作者: Wonjoon Goo , Scott Niekum

DOI:

关键词:

摘要: We introduce an offline reinforcement learning (RL) algorithm that explicitly clones a behavior policy to constrain value learning. In offline RL, it is often important to prevent a …

参考文章(0)