Instrument-Armed Bandits

作者: Nathan Kallus

DOI:

关键词: Mathematical economicsControl (management)RegretComputer science

摘要: We extend the classic multi-armed bandit (MAB) model to setting of noncompliance, where arm pull is a mere instrument and treatment applied may differ from it, which gives rise instrument-armed (IAB) problem. The IAB relevant whenever experimental units are human since free will, ethics, law prohibit unrestricted or forced application treatment. In particular, in models dynamic clinical trials other controlled on interventions. Nonetheless, has not been fully investigate literature. show that there various divergent notions regret this setting, all coincide only MAB setting. characterize behavior these regrets analyze standard algorithms. argue for particular kind captures causal effect treatments but algorithms cannot achieve sublinear control regret. Instead, we develop new problem, prove bounds them, compare them numerical examples.

参考文章(9)
Olivier Cappé, Aurélien Garivier, The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond arXiv: Statistics Theory. ,(2011)
Rémi Munos, Odalric-Ambrym Maillard, Gilles Stoltz, A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences conference on learning theory. pp. 18- ,(2011)
Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer, Finite-time Analysis of the Multiarmed Bandit Problem Machine Learning. ,vol. 47, pp. 235- 256 ,(2002) , 10.1023/A:1013689704352
Elias Bareinboim, Andrew Forney, Judea Pearl, Bandits with unobserved confounders: a causal approach neural information processing systems. ,vol. 28, pp. 1342- 1350 ,(2015)
Odalric-Ambrym Maillard, Shie Mannor, Latent Bandits. international conference on machine learning. ,(2014)
Tor Lattimore, Mark D. Reid, Finnian Lattimore, Causal Bandits: Learning Good Interventions via Causal Inference arXiv: Machine Learning. ,(2016)
Peter L. Bartlett, Shahar Mendelson, Rademacher and gaussian complexities: risk bounds and structural results european conference on computational learning theory. ,vol. 3, pp. 463- 482 ,(2003) , 10.1007/3-540-44581-1_15