作者: Csaba Szepesvári , Dale Schuurmans , Bo Dai , Yinlam Chow , Lihong Li
DOI:
关键词:
摘要: We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only …