作者: Emma Brunskill , Leo Anthony Celi , Finale Doshi-Velez , Joseph Futoma , Sonali Parbhoo
DOI:
关键词:
摘要: Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes domains such as healthcare and education, but safe deployment high stakes settings requires ways assessing its validity. Traditional measures confidence intervals may be insufficient due noise, limited confounding. In this paper we develop a method that could serve hybrid human-AI system, enable human experts analyze validity policy estimates. This is accomplished by highlighting observations whose removal will have large effect on OPE estimate, formulating set rules for choosing which ones present domain validation. We methods compute exactly influence functions fitted Q-evaluation with two different function classes: kernel-based linear least squares, well importance sampling methods. Experiments medical simulations real-world intensive care unit demonstrate our can used identify limitations process make more robust.