Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

作者: Emma Brunskill , Steve Yadlowsky , Hongseok Namkoong , Ramtin Keramati

DOI:

关键词: EconometricsRobustness (computer science)Unobserved confoundingComputer scienceConfoundingPessimismPsychological intervention

摘要: When observed decisions depend only on features, off-policy policy evaluation (OPE) methods for sequential decision making problems can estimate the performance of policies before deploying them. This assumption is frequently violated due to unobserved confounders, unrecorded variables that impact both and their outcomes. We assess robustness OPE under confounding by developing worst-case bounds an policy. confounders affect every in episode, we demonstrate even small amounts per-decision heavily bias methods. Fortunately, a number important settings found healthcare, policy-making, operations, technology, may primarily one many made. Under this less pessimistic model one-decision confounding, propose efficient loss-minimization-based procedure computing bounds, prove its statistical consistency. On two simulated healthcare examples---management sepsis patients developmental interventions autistic children---where reasonable our method invalidates non-robust results provides meaningful certificates robustness, allowing reliable selection confounding.

参考文章(1)
Scott Niekum, Peter Stone, Josiah P. Hanna, Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation. national conference on artificial intelligence. pp. 4933- 4934 ,(2017)