作者: Serena Booth , W Bradley Knox , Julie Shah , Scott Niekum , Peter Stone
DOI:
关键词:
摘要: In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often sparse. For example, a true task metric might encode a reward of …