Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation.

作者: Scott Niekum , Peter Stone , Josiah P. Hanna

DOI:

关键词:

摘要: For an autonomous agent, executing a poor policy may be costly or even dangerous. such agents, it is desirable to determine confidence interval lower bounds on the performance of any given without said policy. Current methods for exact high off-policy evaluation that use importance sampling require substantial amount data achieve tight bound. Existing model-based only address problem in discrete state spaces. Since are intractable many domains we trade off strict guarantees safety more data-efficient approximate bounds. In this context, propose two bootstrapping which learned MDP transition models order estimate with limited both continuous and direct model introduce bias, derive theoretical upper bound bias when function estimated i.i.d. trajectories. This broadens our understanding conditions under have bias. Finally, empirically evaluate proposed analyze settings different succeed fail.

参考文章(0)