The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications

作者： Serena Booth , W Bradley Knox , Julie Shah , Scott Niekum , Peter Stone

DOI:

关键词:

摘要: In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often sparse. For example, a true task metric might encode a reward of …

aaai.org 本地加速

aaai.org PDF 下载加速

slbooth.com PDF 下载加速

参考文章(0)

The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications

来源期刊

我的账户

The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications

来源期刊

相似文章 0

我的账户