Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

作者: Hal Daumé , Chicheng Zhang , Alekh Agarwal , Sahand N Negahban , John Langford

DOI:

关键词:

摘要: We investigate the feasibility of learning from a mix both fully-labeled supervised data and contextual bandit data. specifically consider settings in which underlying signal may be different between these two sources. Theoretically, we state prove no-regret algorithms for that is robust to misaligned cost distributions Empirically, evaluate some on large selection datasets, showing our approach feasible helpful practice.

参考文章(28)
Bianca Zadrozny, Alina Beygelzimer, John Langford, Weighted one-against-all national conference on artificial intelligence. pp. 720- 725 ,(2005)
Robert E. Schapire, Wei Chu, Lihong Li, Lev Reyzin, Contextual bandits with linear Payoff functions international conference on artificial intelligence and statistics. ,vol. 15, pp. 208- 214 ,(2011)
Bin Yu, Assouad, Fano, and Le Cam Festschrift for Lucien Le Cam. pp. 423- 435 ,(1997) , 10.1007/978-1-4612-1880-7_29
Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh, Domain adaptation: Learning bounds and algorithms conference on learning theory. ,(2009)
Avrim Blum, Adam Kalai, John Langford, Beating the hold-out: bounds for K-fold and progressive cross-validation conference on learning theory. pp. 203- 208 ,(1999) , 10.1145/307400.307439
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire, The Nonstochastic Multiarmed Bandit Problem SIAM Journal on Computing. ,vol. 32, pp. 48- 77 ,(2003) , 10.1137/S0097539701398375
Robert Schapire, Lihong Li, Satyen Kale, Alekh Agarwal, John Langford, Daniel Hsu, Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits international conference on machine learning. pp. 1638- 1646 ,(2014)
Robert E. Schapire, Alina Beygelzimer, Lihong Li, Lev Reyzin, John Langford, Contextual Bandit Algorithms with Supervised Learning Guarantees international conference on artificial intelligence and statistics. ,vol. 15, pp. 19- 26 ,(2011)
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, Jennifer Wortman Vaughan, A theory of learning from different domains Machine Learning. ,vol. 79, pp. 151- 175 ,(2010) , 10.1007/S10994-009-5152-4
Nikos Karampatziakis, Miroslav Dudik, Satyen Kale, Lev Reyzin, John Langford, Daniel Hsu, Tong Zhang, Efficient optimal learning for contextual bandits uncertainty in artificial intelligence. pp. 169- 178 ,(2011)