From external to internal regret

作者: Avrim Blum , Yishay Mansour

DOI: 10.1007/11503415_42

关键词:

摘要: External regret compares the performance of an online algorithm, selecting among N actions, to best those actions in hindsight. Internal loss algorithm a modified which consistently replaces one action by another. In this paper, we give simple generic reduction that, given for external problem, converts it efficient internal problem. We provide methods that work both full information model, every is observed at each time step, and partial (bandit) where step only selected observed. The importance game theory due fact general game, if player has sublinear regret, then empirical frequencies converge correlated equilibrium. For also derive quantitative bound very setting includes arbitrary set modification rules (that possibly modify algorithm) selection functions (each giving different weight step). rule difference between cost costs are weighted function. This can be viewed as generalization previously-studied sleeping experts setting.

参考文章(32)
Sergiu Hart, Andreu Mas-Colell, A Reinforcement Procedure Leading to Correlated Equilibrium World Scientific Book Chapters. pp. 181- 200 ,(2001) , 10.1007/978-3-662-04623-4_12
Peter Auer, Claudio Gentile, Adaptive and Self-Confident On-Line Learning Algorithms conference on learning theory. pp. 107- 117 ,(2000)
Nicolò Cesa-Bianchi, Gábor Lugosi, Potential-Based Algorithms in On-Line Prediction and Game Theory Machine Learning. ,vol. 51, pp. 239- 261 ,(2003) , 10.1023/A:1022901500417
Gilles Stoltz, Gábor Lugosi, Internal Regret in On-Line Portfolio Selection Machine Learning. ,vol. 59, pp. 125- 159 ,(2005) , 10.1007/S10994-005-0465-4
Nicolò Cesa-Bianchi, Gábor Lugosi, Gilles Stoltz, Regret Minimization Under Partial Monitoring Mathematics of Operations Research. ,vol. 31, pp. 562- 580 ,(2006) , 10.1287/MOOR.1060.0206
William W. Cohen, Yoram Singer, Context-sensitive learning methods for text categorization ACM Transactions on Information Systems. ,vol. 17, pp. 141- 173 ,(1999) , 10.1145/306686.306688
Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, Manfred K. Warmuth, How to use expert advice Journal of the ACM. ,vol. 44, pp. 427- 485 ,(1997) , 10.1145/258128.258179
David Blackwell, An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics. ,vol. 6, pp. 1- 8 ,(1956) , 10.2140/PJM.1956.6.1
Yoav Freund, Robert E Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting conference on learning theory. ,vol. 55, pp. 119- 139 ,(1997) , 10.1006/JCSS.1997.1504