作者: Shane Legg , Miljan Martic , Victoria Krakovna , Laurent Orseau , Ramana Kumar
DOI:
关键词:
摘要: How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show current approaches penalizing side effects introduce bad incentives, e.g. prevent any irreversible changes in the environment, including actions of other agents. To isolate source such undesirable break down penalties into two components: a baseline state and measure deviation from this state. argue some these incentives arise choice baseline, others measure. new variant stepwise inaction based on relative reachability states. The combination choices avoids given while simpler baselines unreachability fail. demonstrate empirically by comparing different combinations set gridworld experiments designed illustrate possible incentives.