Online convex optimization in the bandit setting: gradient descent without a gradient

作者: H. Brendan McMahan , Adam Tauman Kalai , Abraham D. Flaxman

DOI: 10.5555/1070432.1070486

关键词:

摘要: We study a general online convex optimization problem. have set S and an unknown sequence of cost functions c1, c2,..., in each period, we choose feasible point xt S, learn the ct(xt). If function ct is also revealed after period then, as Zinkevich shows [25], gradient descent can be used on these to get regret bounds O(√n). That is, n rounds, total incurred will O(√n) more than best single decision chosen with benefit hindsight, minx Σ ct(x).We extend this "bandit" setting, where, only ct(xt) revealed, bound expected O(n3/4).Our approach uses simple approximation that computed from evaluating at (random) point. show biased estimate sufficient approximate functions. In other words, it possible use without seeing anything value The guarantees hold even most case: against adaptive adversary.For linear problem [15], algorithms low regrets bandit setting recently been given oblivious [1] adversaries [19]. contrast algorithms, which distinguish between explicit explore exploit periods, our algorithm interpreted doing small amount exploration period.

参考文章(22)
O. N. Granichin, Randomized Algorithms for Stochastic Approximation under Arbitrary Disturbances Automation and Remote Control. ,vol. 63, pp. 209- 219 ,(2002) , 10.1023/A:1014291407082
Eiji Takimoto, Manfred K. Warmuth, Path Kernels and Multiplicative Updates conference on learning theory. pp. 74- 89 ,(2002) , 10.1007/3-540-45435-7_6
John R. Birge, Franois Louveaux, Introduction to Stochastic Programming ,(2011)
H. Brendan McMahan, Avrim Blum, Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary Learning Theory. pp. 109- 123 ,(2004) , 10.1007/978-3-540-27819-1_8
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)
David P. Helmbold, Robert E. Schapire, Yoram Singer, Manfred K. Warmuth, ON-LINE PORTFOLIO SELECTION USING MULTIPLICATIVE UPDATES Mathematical Finance. ,vol. 8, pp. 325- 347 ,(1998) , 10.1111/1467-9965.00058
Dimitris Bertsimas, Santosh Vempala, Solving convex programs by random walks symposium on the theory of computing. pp. 109- 115 ,(2002) , 10.1145/509907.509926