Structural Causal Bandits with Non-Manipulable Variables

作者: Sanghack Lee , Elias Bareinboim

DOI: 10.1609/AAAI.V33I01.33014164

关键词:

摘要: Causal knowledge is sought after throughout data-driven fields due to its explanatory power and potential value inform decision-making. If the targeted system well-understood in terms of causal components, one able design more precise surgical interventions so as bring certain desired outcomes about. The idea leveraging understanding a improve decision-making has been studied literature under rubric structural bandits (Lee Bareinboim, 2018). In this setting, (1) pulling an arm corresponds performing intervention on set variables, while (2) associated rewards are governed by underlying mechanisms. One key assumption work that any observed variable (X) manipulable, which means intervening making do(X = x) always realizable. many real-world scenarios, however, too stringent requirement. For instance, scientific evidence may support obesity shortens life, it’s not feasible manipulate directly, but, for example, decreasing amount soda consumption (Pearl, paper, we study relaxed version bandit problem when all variables manipulable. Specifically, develop procedure takes argument partially specified identifies possibly-optimal arms with non-manipulable variables. We further introduce algorithm uncovers non-trivial dependence structure among arms. Finally, corroborate our findings simulations, shows MAB solvers enhanced newly discovered consistently outperform causal-insensitive solvers.

参考文章(26)
Sham M Kakade, Thomas P Hayes, Varsha Dani, Stochastic Linear Optimization Under Bandit Feedback conference on learning theory. pp. 355- 366 ,(2008)
Judea Pearl, Jin Tian, Studies in causal reasoning and learning University of California, Los Angeles. ,(2002)
Olivier Cappé, Aurélien Garivier, The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond arXiv: Statistics Theory. ,(2011)
Thomas S Verma, Judea Pearl, None, Equivalence and synthesis of causal models uncertainty in artificial intelligence. pp. 255- 270 ,(1990)
Robert J Tibshirani, Bradley Efron, An introduction to the bootstrap ,(1993)
Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8
Stefan Magureanu, Alexandre Proutiere, Richard Combes, Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms conference on learning theory. pp. 975- 999 ,(2014)
JUDEA PEARL, Causal diagrams for empirical research Biometrika. ,vol. 82, pp. 669- 688 ,(1995) , 10.1093/BIOMET/82.4.669