Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

作者： Emma Brunskill , Leo Anthony Celi , Finale Doshi-Velez , Joseph Futoma , Sonali Parbhoo

DOI:

关键词:

摘要: Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes domains such as healthcare and education, but safe deployment high stakes settings requires ways assessing its validity. Traditional measures confidence intervals may be insufficient due noise, limited confounding. In this paper we develop a method that could serve hybrid human-AI system, enable human experts analyze validity policy estimates. This is accomplished by highlighting observations whose removal will have large effect on OPE estimate, formulating set rules for choosing which ones present domain validation. We methods compute exactly influence functions fitted Q-evaluation with two different function classes: kernel-based linear least squares, well importance sampling methods. Experiments medical simulations real-world intensive care unit demonstrate our can used identify limitations process make more robust.

arxiv.org 本地加速

arxiv.org PDF 下载加速

参考文章(3)

Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)

A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)

Scott Niekum, Peter Stone, Josiah P. Hanna, Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation. national conference on artificial intelligence. pp. 4933- 4934 ,(2017)

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

来源期刊

我的账户

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

来源期刊

相似文章 0

我的账户