Stochastic Variance Reduced Gradient Methods Using a Trust-Region-Like Scheme

作者: Yu-Hong Dai , Xin-Wei Liu , Tengteng Yu , Jie Sun , Jie Sun

DOI: 10.1007/S10915-020-01402-X

关键词:

摘要: Stochastic variance reduced gradient (SVRG) methods are important approaches to minimize the average of a large number cost functions frequently arising in machine learning and many other applications. In this paper, based on SVRG, we propose SVRG-TR method which employs trust-region-like scheme for selecting stepsizes. It is proved that linearly convergent expectation smooth strongly convex enjoys faster convergence rate than SVRG methods. order overcome difficulty tuning stepsizes by hand, combine Barzilai–Borwein (BB) automatically compute method, named as SVRG-TR-BB method. By incorporating mini-batching with SVRG-TR-BB, respectively, further two extended mSVRG-TR mSVRG-TR-BB. Linear complexity given. Numerical experiments some standard datasets show generally better or comparable best-tuned modern stochastic methods, while mSVRG-TR-BB very competitive mini-batch variants recent successful

参考文章(45)
Ilya Sutskever, Geoffrey Hinton, James Martens, George Dahl, On the importance of initialization and momentum in deep learning international conference on machine learning. pp. 1139- 1147 ,(2013)
Léon Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent Proceedings of COMPSTAT'2010. pp. 177- 186 ,(2010) , 10.1007/978-3-7908-2604-3_16
Shai Shalev-Shwartz, Shai Ben-David, Understanding Machine Learning: From Theory to Algorithms ,(2015)
Peilin Zhao, Peilin Zhao, Peilin Zhao, Tong Zhang, Tong Zhang, Stochastic Optimization with Importance Sampling for Regularized Loss Minimization international conference on machine learning. ,vol. 1, pp. 1- 9 ,(2015)
Sham M. Kakade, Roy Frostig, Aaron Sidford, Rong Ge, Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization arXiv: Machine Learning. ,(2015)
J.-F. Cardoso, B.H. Laheld, Equivariant adaptive source separation IEEE Transactions on Signal Processing. ,vol. 44, pp. 3017- 3030 ,(1996) , 10.1109/78.553476
Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586
Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, Online dictionary learning for sparse coding Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 689- 696 ,(2009) , 10.1145/1553374.1553463
Zaiwen Wen, Wotao Yin, Donald Goldfarb, Yin Zhang, A Fast Algorithm for Sparse Reconstruction Based on Shrinkage, Subspace Optimization, and Continuation SIAM Journal on Scientific Computing. ,vol. 32, pp. 1832- 1857 ,(2010) , 10.1137/090747695
Yakui Huang, Hongwei Liu, Sha Zhou, A Barzilai---Borwein type method for stochastic linear complementarity problems Numerical Algorithms. ,vol. 67, pp. 477- 489 ,(2014) , 10.1007/S11075-013-9803-Y