作者: Vianney Perchet , Thomas Nedelec , Nicolas Le Roux
DOI:
关键词: Importance sampling 、 Estimator 、 Mathematics 、 Statistics 、 Counterfactual thinking 、 Econometrics
摘要: We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Sampling), detailing the different regimes where they are individually suboptimal. then exhibit properties optimal should possess. In case examples have been gathered using multiple policies, we show that fused dominate basic ones but can still be improved.