On the Noisy Gradient Descent that Generalizes as SGD

作者: Jingfeng Wu , Wenqing Hu , Haoyi Xiong , Jun Huan , Vladimir Braverman

DOI:

关键词: CovarianceMatrix (mathematics)AlgorithmGaussianComputer scienceDeep learningArtificial intelligenceNoiseGeneralizationSampling (statistics)Gradient noiseRegularization (mathematics)Gradient descent

摘要: … the impact of the noise class. On the other hand, thanks to the flexibility of choosing noise class, we are allowed to use noisy gradient descent with best fitted noises based on practical …

参考文章(51)
V. S. Borkar, S. K. Mitter, A Strong Approximation Theorem for Stochastic Recursive Algorithms Journal of Optimization Theory and Applications. ,vol. 100, pp. 499- 513 ,(1999) , 10.1023/A:1022630321574
Amos J. Storkey, Xiaocheng Shang, Zhanxing Zhu, Benedict Leimkuhler, Covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling neural information processing systems. ,vol. 28, pp. 37- 45 ,(2015)
Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson, Local Rademacher complexities Annals of Statistics. ,vol. 33, pp. 1497- 1537 ,(2005) , 10.1214/009053605000000282
Francis R. Bach, Alexandre Défossez, Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions international conference on artificial intelligence and statistics. pp. 205- 213 ,(2015)
A. N. Kolmogorov, B. W. Gnedenko, Limit Distributions for Sums of Independent Random Variables ,(1954)
Bernt Øksendal, Stochastic Differential Equations The Mathematical Gazette. ,vol. 77, pp. 65- 84 ,(1985) , 10.1007/978-3-642-14394-6_5
Carlos Guestrin, Tianqi Chen, Emily Fox, Stochastic Gradient Hamiltonian Monte Carlo international conference on machine learning. pp. 1683- 1691 ,(2014)
Francis Bach, Eric Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) neural information processing systems. ,vol. 26, pp. 773- 781 ,(2013)
Anoop Korattikara, Max Welling, Sungjin Ahn, Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring international conference on machine learning. pp. 1771- 1778 ,(2012)