Causal Bandits: Learning Good Interventions via Causal Inference

作者: Tor Lattimore , Mark D. Reid , Finnian Lattimore

DOI:

关键词: ExploitMachine learningArtificial intelligenceCausal informationCausal modelRegretPsychological interventionCausal inferenceMathematicsFormalism (philosophy of mathematics)

摘要: We study the problem of using causal models to improve rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and inference model novel type bandit feedback that is not exploited by existing approaches. propose new algorithm exploits prove bound on its simple regret strictly better (in all quantities) than algorithms do use additional information.

参考文章(29)
Csaba Szepesvári, Rémi Munos, Lihong Li, On Minimax Optimal Offline Policy Evaluation. arXiv: Artificial Intelligence. ,(2014)
Alexandre B. Tsybakov, Introduction to Nonparametric Estimation ,(2008)
Nir Friedman, Daniel L. Koller, Probabilistic graphical models : principles and techniques The MIT Press. ,(2009)
Jean-Yves Audibert, Sébastien Bubeck, Rémi Munos, Best Arm Identification in Multi-Armed Bandits conference on learning theory. pp. 41- 53 ,(2010)
Eyal Even-Dar, Shie Mannor, Yishay Mansour, PAC Bounds for Multi-armed Bandit and Markov Decision Processes conference on learning theory. pp. 255- 270 ,(2002) , 10.1007/3-540-45435-7_18
Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Pure exploration in multi-armed bandits problems algorithmic learning theory. pp. 23- 37 ,(2009) , 10.1007/978-3-642-04414-4_7
Gábor Bartók, Dean P. Foster, Dávid Pál, Alexander Rakhlin, Csaba Szepesvári, Partial Monitoring—Classification, Regret Bounds, and Algorithms Mathematics of Operations Research. ,vol. 39, pp. 967- 997 ,(2014) , 10.1287/MOOR.2014.0663
Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8
Torben Hagerup, Christine Rüb, A guided tour of Chernoff bounds Information Processing Letters. ,vol. 33, pp. 305- 308 ,(1990) , 10.1016/0020-0190(90)90214-I
Robert Schapire, Lihong Li, Satyen Kale, Alekh Agarwal, John Langford, Daniel Hsu, Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits international conference on machine learning. pp. 1638- 1646 ,(2014)