Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods

作者: Francis Bach , Robert M. Gower , Nicolas Le Roux

DOI:

关键词: AlgorithmTracking (particle physics)Range (mathematics)Variance (accounting)DiagonalMatrix (mathematics)Control variatesMathematical optimizationConvergence (routing)Hessian matrixComputer science

摘要: Our goal is to improve variance reducing stochastic methods through better control variates. We first propose a modification of SVRG which uses the Hessian track gradients over time, rather than recondition, increasing correlation variates and leading faster theoretical convergence close optimum. then accurate computationally efficient approximations Hessian, both using diagonal low-rank matrix. Finally, we demonstrate effectiveness our method on wide range problems.

参考文章(14)
Jacek Gondzio, Robert Mansel Gower, Action constrained quasi-Newton methods arXiv: Optimization and Control. ,(2014)
Reza Babanezhad, Aaron Defazio, Anoop Sarkar, Mohamed Osama Ahmed, Mark Schmidt, Ann Clifton, Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields arXiv: Machine Learning. ,(2015)
Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586
R. Fletcher, M. J. D. Powell, A Rapidly Convergent Descent Method for Minimization The Computer Journal. ,vol. 6, pp. 163- 168 ,(1963) , 10.1093/COMJNL/6.2.163
Rong Jin, Mehrdad Mahdavi, Lijun Zhang, Linear Convergence with Condition Number Independent Access of Full Gradients neural information processing systems. ,vol. 26, pp. 980- 988 ,(2013)
BRUCE CHRISTIANSON, Automatic Hessians by reverse accumulation Ima Journal of Numerical Analysis. ,vol. 12, pp. 135- 150 ,(1992) , 10.1093/IMANUM/12.2.135
W. C. Davidon, Variance algorithm for minimization The Computer Journal. ,vol. 10, pp. 406- 410 ,(1968) , 10.1093/COMJNL/10.4.406
Robert M. Gower, Peter Richtárik, Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms arXiv: Numerical Analysis. ,(2016)
Julien Mairal, Optimization with First-Order Surrogate Functions arXiv: Machine Learning. ,(2013)