Bandits and Experts in Metric Spaces

作者: Robert Kleinberg , Aleksandrs Slivkins , Eli Upfal

DOI: 10.1145/3299873

关键词:

摘要: In a multi-armed bandit problem, an online algorithm chooses from set of strategies in sequence trials to maximize the total payoff chosen strategies. While performance algorithms with small finite strategy is well understood, problems large sets are still topic active investigation, motivated by practical applications, such as auctions and web advertisement. The goal research identify broad natural classes functions that enable design efficient solutions.In this work, we study general setting for which form metric space, function satisfies Lipschitz condition respect metric. We refer problem MAB problem. present solution setting. That is, every define isometry invariant bounds below comes arbitrarily close meeting bound. Furthermore, our technique gives even better results benign functions. also address full-feedback (“best expert”) version where after round payoffs all arms revealed.

参考文章(50)
Stefan Mazurkiewicz, Wacław Sierpiński, Contribution à la topologie des ensembles dénombrables Fundamenta Mathematicae. ,vol. 1, pp. 17- 27 ,(1920) , 10.4064/FM-1-1-17-27
Levente Kocsis, Csaba Szepesvári, Bandit Based Monte-Carlo Planning Lecture Notes in Computer Science. pp. 282- 293 ,(2006) , 10.1007/11871842_29
Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, Larry A. Shepp, Bandit problems with infinitely many arms Annals of Statistics. ,vol. 25, pp. 2103- 2116 ,(1997) , 10.1214/AOS/1069362389
Peter Auer, Ronald Ortner, UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem Periodica Mathematica Hungarica. ,vol. 61, pp. 55- 65 ,(2010) , 10.1007/S10998-010-3055-6
Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, Manfred K. Warmuth, How to use expert advice Journal of the ACM. ,vol. 44, pp. 427- 485 ,(1997) , 10.1145/258128.258179
Sariel Har-Peled, Manor Mendel, Fast construction of nets in low dimensional metrics, and their applications symposium on computational geometry. pp. 150- 158 ,(2005) , 10.1145/1064092.1064117
Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8
T.L Lai, Herbert Robbins, Asymptotically efficient adaptive allocation rules Advances in Applied Mathematics. ,vol. 6, pp. 4- 22 ,(1985) , 10.1016/0196-8858(85)90002-8
Rajeev Agrawal, The Continuum-Armed Bandit Problem Siam Journal on Control and Optimization. ,vol. 33, pp. 1926- 1951 ,(1995) , 10.1137/S0363012992237273