Improved Algorithms for Linear Stochastic Bandits

作者: Émilie Kaufmann

DOI:

关键词: AlgorithmLogarithmHigh probabilityThompson samplingSimple (abstract algebra)Mathematical optimizationMathematicsRegretConstant (mathematics)

摘要: … The linear bandit problem … The linear bandit problem Goal: Design an algorithm (ie a sequential choice of the arms) minimizing the regret : …

参考文章(30)
Sham M Kakade, Thomas P Hayes, Varsha Dani, Stochastic Linear Optimization Under Bandit Feedback conference on learning theory. pp. 355- 366 ,(2008)
Karthik Sridharan, Ofer Dekel, Claudio Gentile, Robust Selective Sampling from Single and Multiple Teachers. conference on learning theory. pp. 346- 358 ,(2010)
Ji-guang Sun, G. W. Stewart, Matrix perturbation theory ,(1990)
Robert E. Schapire, Wei Chu, Lihong Li, Lev Reyzin, Contextual bandits with linear Payoff functions international conference on artificial intelligence and statistics. ,vol. 15, pp. 208- 214 ,(2011)
Víctor De la Peña, Tze Leung Lai, Qi-Man Shao, Self-Normalized Processes: Limit Theory and Statistical Applications ,(2001)
Nicolo Cesa-Bianchi, Gabor Lugosi, Prediction, learning, and games ,(2006)
Aurélien Garivier, Eric Moulines, On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems arXiv: Statistics Theory. ,(2008)
Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8
Robert Kleinberg, Alexandru Niculescu-Mizil, Yogeshwer Sharma, Regret bounds for sleeping experts and bandits Machine Learning. ,vol. 80, pp. 245- 272 ,(2010) , 10.1007/S10994-010-5178-7