Improved Algorithms for Linear Stochastic Bandits

DOI:

关键词: Algorithm 、 Logarithm 、 High probability 、 Thompson sampling 、 Simple (abstract algebra) 、 Mathematical optimization 、 Mathematics 、 Regret 、 Constant (mathematics)

摘要: … The linear bandit problem … The linear bandit problem Goal: Design an algorithm (ie a sequential choice of the arms) minimizing the regret : …

参考文章(30)

Sham M Kakade, Thomas P Hayes, Varsha Dani, Stochastic Linear Optimization Under Bandit Feedback conference on learning theory. pp. 355- 366 ,(2008)

Karthik Sridharan, Ofer Dekel, Claudio Gentile, Robust Selective Sampling from Single and Multiple Teachers. conference on learning theory. pp. 346- 358 ,(2010)

Ji-guang Sun, G. W. Stewart, Matrix perturbation theory ,(1990)

Robert E. Schapire, Wei Chu, Lihong Li, Lev Reyzin, Contextual bandits with linear Payoff functions international conference on artificial intelligence and statistics. ,vol. 15, pp. 208- 214 ,(2011)

Víctor De la Peña, Tze Leung Lai, Qi-Man Shao, Self-Normalized Processes: Limit Theory and Statistical Applications ,(2001)

Nicolo Cesa-Bianchi, Gabor Lugosi, Prediction, learning, and games ,(2006)

Aurélien Garivier, Eric Moulines, On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems arXiv: Statistics Theory. ,(2008)

Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8

Tze Leung Lai, Ching Zong Wei, Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems Annals of Statistics. ,vol. 10, pp. 154- 166 ,(1982) , 10.1214/AOS/1176345697

10.

Robert Kleinberg, Alexandru Niculescu-Mizil, Yogeshwer Sharma, Regret bounds for sleeping experts and bandits Machine Learning. ,vol. 80, pp. 245- 272 ,(2010) , 10.1007/S10994-010-5178-7

Improved Algorithms for Linear Stochastic Bandits

来源期刊

我的账户

Improved Algorithms for Linear Stochastic Bandits

来源期刊

相似文章 10

我的账户