Contextual Bandit Algorithms with Supervised Learning Guarantees

作者: Robert E. Schapire , Alina Beygelzimer , Lihong Li , Lev Reyzin , John Langford

DOI:

关键词:

摘要: We address the problem of competing with any large set N policies in nonstochastic bandit setting, where learner must repeatedly select among K actions but observes only reward chosen action.

参考文章(23)
Matthew J. Streeter, H. Brendan McMahan, Tighter Bounds for Multi-Armed Bandits with Expert Advice conference on learning theory. ,(2009)
Sham M. Kakade, Alexander Rakhlin, Peter L. Bartlett, Thomas P. Hayes, Ambuj Tewari, Varsha Dani, High-probability regret bounds for bandit online linear optimization conference on learning theory. pp. 335- 342 ,(2008)
Nicolo Cesa-Bianchi, Gabor Lugosi, Prediction, learning, and games ,(2006)
Shai Shalev-Shwartz, Dávid Pál, Shai Ben-David, Agnostic Online Learning. conference on learning theory. ,(2009)
David Haussler, Philip M Long, A GENERALIZATION OF SAUER''S LEMMA Journal of Combinatorial Theory, Series A. ,vol. 71, pp. 219- 240 ,(1995) , 10.1016/0097-3165(95)90001-2
Leslie Pack Kaelbling, Associative Reinforcement Learning: Functions in k -DNF Machine Learning. ,vol. 15, pp. 279- 298 ,(1994) , 10.1023/A:1022689909846
Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8
T.L Lai, Herbert Robbins, Asymptotically efficient adaptive allocation rules Advances in Applied Mathematics. ,vol. 6, pp. 4- 22 ,(1985) , 10.1016/0196-8858(85)90002-8
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire, The Nonstochastic Multiarmed Bandit Problem SIAM Journal on Computing. ,vol. 32, pp. 48- 77 ,(2003) , 10.1137/S0097539701398375
David A. Freedman, On Tail Probabilities for Martingales Annals of Probability. ,vol. 3, pp. 100- 118 ,(1975) , 10.1214/AOP/1176996452