作者: Tor Lattimore , Mark D. Reid , Finnian Lattimore
DOI:
关键词: Exploit 、 Machine learning 、 Artificial intelligence 、 Causal information 、 Causal model 、 Regret 、 Psychological intervention 、 Causal inference 、 Mathematics 、 Formalism (philosophy of mathematics)
摘要: We study the problem of using causal models to improve rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and inference model novel type bandit feedback that is not exploited by existing approaches. propose new algorithm exploits prove bound on its simple regret strictly better (in all quantities) than algorithms do use additional information.