A comparative study of counterfactual estimators

作者: Vianney Perchet , Thomas Nedelec , Nicolas Le Roux

DOI:

关键词: Importance samplingEstimatorMathematicsStatisticsCounterfactual thinkingEconometrics

摘要: We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Sampling), detailing the different regimes where they are individually suboptimal. then exhibit properties optimal should possess. In case examples have been gathered using multiple policies, we show that fused dominate basic ones but can still be improved.

参考文章(14)
Csaba Szepesvári, Rémi Munos, Lihong Li, On Minimax Optimal Offline Policy Evaluation. arXiv: Artificial Intelligence. ,(2014)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)
Christian R. Shelton, Policy improvement for POMDPs using normalized importance sampling uncertainty in artificial intelligence. pp. 496- 503 ,(2001)
D.P. Bertsekas, J.N. Tsitsiklis, Neuro-dynamic programming: an overview conference on decision and control. ,vol. 1, pp. 560- 564 ,(1995) , 10.1109/CDC.1995.478953
John Michael Hammersley, Monte Carlo methods ,(1964)
Tomaso Poggio, Christian Robert Shelton, Importance sampling for reinforcement learning with multiple objectives Massachusetts Institute of Technology. ,(2001)
M. J. D. POWELL, J. SWANN, Weighted Uniform Sampling — a Monte Carlo Technique for Reducing Variance Ima Journal of Applied Mathematics. ,vol. 2, pp. 228- 236 ,(1966) , 10.1093/IMAMAT/2.3.228
Elon Portugaly, Léon Bottou, D. Max Chickering, Denis X. Charles, Dipankar Ray, Jonas Peters, Patrice Simard, Ed Snelson, Joaquin Quiñonero-Candela, Counterfactual reasoning and learning systems: the example of computational advertising Journal of Machine Learning Research. ,vol. 14, pp. 3207- 3260 ,(2013)
Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics neural information processing systems. ,vol. 21, pp. 849- 856 ,(2008) , 10.1007/978-3-319-03194-1_4