Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

作者: Emma Brunskill , Leo Anthony Celi , Finale Doshi-Velez , Joseph Futoma , Sonali Parbhoo

DOI:

关键词:

摘要: Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes domains such as healthcare and education, but safe deployment high stakes settings requires ways assessing its validity. Traditional measures confidence intervals may be insufficient due noise, limited confounding. In this paper we develop a method that could serve hybrid human-AI system, enable human experts analyze validity policy estimates. This is accomplished by highlighting observations whose removal will have large effect on OPE estimate, formulating set rules for choosing which ones present domain validation. We methods compute exactly influence functions fitted Q-evaluation with two different function classes: kernel-based linear least squares, well importance sampling methods. Experiments medical simulations real-world intensive care unit demonstrate our can used identify limitations process make more robust.

参考文章(3)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Scott Niekum, Peter Stone, Josiah P. Hanna, Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation. national conference on artificial intelligence. pp. 4933- 4934 ,(2017)