The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications

作者: Serena Booth , W Bradley Knox , Julie Shah , Scott Niekum , Peter Stone

DOI:

关键词:

摘要: In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often sparse. For example, a true task metric might encode a reward of …

参考文章(0)