作者: Nathan Kallus
DOI:
关键词: Mathematical economics 、 Control (management) 、 Regret 、 Computer science
摘要: We extend the classic multi-armed bandit (MAB) model to setting of noncompliance, where arm pull is a mere instrument and treatment applied may differ from it, which gives rise instrument-armed (IAB) problem. The IAB relevant whenever experimental units are human since free will, ethics, law prohibit unrestricted or forced application treatment. In particular, in models dynamic clinical trials other controlled on interventions. Nonetheless, has not been fully investigate literature. show that there various divergent notions regret this setting, all coincide only MAB setting. characterize behavior these regrets analyze standard algorithms. argue for particular kind captures causal effect treatments but algorithms cannot achieve sublinear control regret. Instead, we develop new problem, prove bounds them, compare them numerical examples.