Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

作者： Olivier Teytaud , Nicolas Baskiotis , Sylvain Gelly , Cédric Hartland , Michèle Sebag

DOI:

关键词:

摘要: Motivated by realtime website optimization, this paper is about online learning in abruptly changing environments. Two extensions of the UCBT algorithm are combined order to handle dynamic multi-armed bandits, and specifically cope with fast variations rewards. Firstly, a change point detection test based on Page-Hinkley statistics used overcome limitations due inertia. Secondly, controlled forgetting strategy dubbed Meta-Bandit proposed take care Exploration vs Exploitation trade-off when PH triggered. Extensive empirical validation shows significant improvements compared baseline algorithms. The also investigates sensitivity respect number available options.

inria.fr 本地加速

inria.fr PDF 下载加速

参考文章(17)

Csaba Szepesvári, Levente Kocsis, Reduced-Variance Payoff Estimation in Adversarial Bandit Problems ,(2005)

Nicolo Cesa-Bianchi, Gabor Lugosi, Prediction, learning, and games ,(2006)

David V. Hinkley, Inference in Two-Phase Regression Journal of the American Statistical Association. ,vol. 66, pp. 736- 743 ,(1971) , 10.1080/01621459.1971.10482337

G. Lorden, PROCEDURES FOR REACTING TO A CHANGE IN DISTRIBUTION Annals of Mathematical Statistics. ,vol. 42, pp. 1897- 1908 ,(1971) , 10.1214/AOMS/1177693055

Michèle Basseville, Detecting changes in signals and systems—a survey Automatica. ,vol. 24, pp. 309- 326 ,(1988) , 10.1016/0005-1098(88)90073-8

George V. Moustakides, Optimal stopping times for detecting changes in distributions Annals of Statistics. ,vol. 14, pp. 1379- 1387 ,(1986) , 10.1214/AOS/1176350164

E. S. PAGE, CONTINUOUS INSPECTION SCHEMES Biometrika. ,vol. 41, pp. 100- 115 ,(1954) , 10.1093/BIOMET/41.1-2.100

David V. Hinkley, Inference about the change-point in a sequence of binomial variables Biometrika. ,vol. 57, pp. 1- 17 ,(1970) , 10.1093/BIOMET/57.1.1

D. V. HINKLEY, Inference about the change-point from cumulative sum tests Biometrika. ,vol. 58, pp. 509- 523 ,(1971) , 10.1093/BIOMET/58.3.509

10.

Graham Cormode, S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications Journal of Algorithms. ,vol. 55, pp. 58- 75 ,(2005) , 10.1016/J.JALGOR.2003.12.001

Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

来源期刊

我的账户

Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

来源期刊

相似文章 10

我的账户