Structural Causal Bandits with Non-Manipulable Variables

关键词:

摘要: Causal knowledge is sought after throughout data-driven fields due to its explanatory power and potential value inform decision-making. If the targeted system well-understood in terms of causal components, one able design more precise surgical interventions so as bring certain desired outcomes about. The idea leveraging understanding a improve decision-making has been studied literature under rubric structural bandits (Lee Bareinboim, 2018). In this setting, (1) pulling an arm corresponds performing intervention on set variables, while (2) associated rewards are governed by underlying mechanisms. One key assumption work that any observed variable (X) manipulable, which means intervening making do(X = x) always realizable. many real-world scenarios, however, too stringent requirement. For instance, scientific evidence may support obesity shortens life, it’s not feasible manipulate directly, but, for example, decreasing amount soda consumption (Pearl, paper, we study relaxed version bandit problem when all variables manipulable. Specifically, develop procedure takes argument partially specified identifies possibly-optimal arms with non-manipulable variables. We further introduce algorithm uncovers non-trivial dependence structure among arms. Finally, corroborate our findings simulations, shows MAB solvers enhanced newly discovered consistently outperform causal-insensitive solvers.

uni-trier.de 本地加速

aaai.org 本地加速

sci-hub.st HTML 下载加速

参考文章(26)

Sham M Kakade, Thomas P Hayes, Varsha Dani, Stochastic Linear Optimization Under Bandit Feedback conference on learning theory. pp. 355- 366 ,(2008)

A Method of Estimating Comparative Rates from Clinical Data. Applications to Cancer of the Lung, Breast, and Cervix Journal of the National Cancer Institute. ,vol. 11, pp. 1269- 1275 ,(1951) , 10.1093/JNCI/11.6.1269

Judea Pearl, Jin Tian, Studies in causal reasoning and learning University of California, Los Angeles. ,(2002)

Olivier Cappé, Aurélien Garivier, The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond arXiv: Statistics Theory. ,(2011)

Thomas S Verma, Judea Pearl, None, Equivalence and synthesis of causal models uncertainty in artificial intelligence. pp. 255- 270 ,(1990)

Robert J Tibshirani, Bradley Efron, An introduction to the bootstrap ,(1993)

Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8

Stefan Magureanu, Alexandre Proutiere, Richard Combes, Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms conference on learning theory. pp. 975- 999 ,(2014)

W. R THOMPSON, ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES Biometrika. ,vol. 25, pp. 285- 294 ,(1933) , 10.1093/BIOMET/25.3-4.285

10.

JUDEA PEARL, Causal diagrams for empirical research Biometrika. ,vol. 82, pp. 669- 688 ,(1995) , 10.1093/BIOMET/82.4.669

Structural Causal Bandits with Non-Manipulable Variables

来源期刊

我的账户

Structural Causal Bandits with Non-Manipulable Variables

来源期刊

相似文章 7

von Neumann-Morgenstern and Savage Theorems for Causal Decision Making.

Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains.

Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach

Causal Imitation Learning With Unobserved Confounders

Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe

Budgeted and Non-budgeted Causal Bandits

Hierarchical Causal Bandit.

我的账户