Pure exploration in multi-armed bandits problems

作者: Sébastien Bubeck , Rémi Munos , Gilles Stoltz

DOI: 10.1007/978-3-642-04414-4_7

关键词:

摘要: … We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that perform an online exploration of the arms. The …

参考文章(22)
Guido Sanguinetti, Neil D. Lawrence, Missing Data in Kernel PCA Lecture Notes in Computer Science. pp. 751- 758 ,(2006) , 10.1007/11871842_76
Karl H. Schlag, ELEVEN - Tests Needed for a Recommendation Research Papers in Economics. ,(2006)
Omid Madani, Daniel J. Lizotte, Russell Greiner, The Budgeted Multi-armed Bandit Problem Learning Theory. pp. 643- 645 ,(2004) , 10.1007/978-3-540-27819-1_46
Luc Devroye, Gábor Lugosi, Combinatorial Methods in Density Estimation ,(2011)
Rémi Munos, Sébastien Bubeck, Gilles Stoltz, Pure Exploration for Multi-Armed Bandit Problems arXiv: Statistics Theory. ,(2008)
Levente Kocsis, Csaba Szepesvári, Bandit Based Monte-Carlo Planning Lecture Notes in Computer Science. pp. 282- 293 ,(2006) , 10.1007/11871842_29
Eyal Even-Dar, Shie Mannor, Yishay Mansour, PAC Bounds for Multi-armed Bandit and Markov Decision Processes conference on learning theory. pp. 255- 270 ,(2002) , 10.1007/3-540-45435-7_18
Olivier Teytaud, Rémi Munos, Sylvain Gelly, Yizao Wang, Modification of UCT with Patterns in Monte-Carlo Go INRIA. ,(2006)
Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8
T.L Lai, Herbert Robbins, Asymptotically efficient adaptive allocation rules Advances in Applied Mathematics. ,vol. 6, pp. 4- 22 ,(1985) , 10.1016/0196-8858(85)90002-8