Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions

作者: Francis R. Bach , Alexandre Défossez

DOI:

关键词:

摘要: We consider the least-squares regression problem and provide a detailed asymptotic analysis of performance averaged constant-step-size stochastic gradient descent. In strongly-convex case, we an expansion up to explicit exponentially decaying terms. Our leads new insights into approximation algorithms: (a) it gives tighter bound on allowed step-size; (b) generalization error may be divided variance term which is as O(1/n), independently step-size , bias that decays O(1/ 2 n ); (c) when allowing non-uniform sampling examples over dataset, choice good density depends trade-off between variance: dominates, optimal densities do not lead much gain, while can choose larger step-sizes significant improvements.

参考文章(22)
B. T. Polyak, A. B. Juditsky, Acceleration of stochastic approximation by averaging Siam Journal on Control and Optimization. ,vol. 30, pp. 838- 855 ,(1992) , 10.1137/0330046
Yu. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems Siam Journal on Optimization. ,vol. 22, pp. 341- 362 ,(2012) , 10.1137/100802001
L�on Bottou, Yann Le Cun, On-line learning for very large data sets Applied Stochastic Models in Business and Industry. ,vol. 21, pp. 137- 151 ,(2005) , 10.1002/ASMB.538
Léon Bottou, Olivier Bousquet, The Tradeoffs of Large Scale Learning neural information processing systems. ,vol. 20, pp. 161- 168 ,(2007)
F. Perronnin, Z. Akata, Z. Harchaoui, C. Schmid, Towards good practice in large-scale learning for image classification computer vision and pattern recognition. pp. 3482- 3489 ,(2012) , 10.1109/CVPR.2012.6248090
Francis R. Bach, Eric Moulines, Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning neural information processing systems. ,vol. 24, pp. 451- 459 ,(2011)
Francis Bach, Eric Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) neural information processing systems. ,vol. 26, pp. 773- 781 ,(2013)
N. Bershad, Analysis of the normalized LMS algorithm with Gaussian inputs IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 34, pp. 793- 806 ,(1986) , 10.1109/TASSP.1986.1164914
Deanna Needell, Nathan Srebro, Rachel Ward, Stochastic gradient descent and the randomized Kaczmarz algorithm. ,(2013)