On the Noisy Gradient Descent that Generalizes as SGD

作者： Jingfeng Wu , Wenqing Hu , Haoyi Xiong , Jun Huan , Vladimir Braverman

DOI:

关键词: Covariance 、 Matrix (mathematics) 、 Algorithm 、 Gaussian 、 Computer science 、 Deep learning 、 Artificial intelligence 、 Noise 、 Generalization 、 Sampling (statistics) 、 Gradient noise 、 Regularization (mathematics) 、 Gradient descent

摘要: … the impact of the noise class. On the other hand, thanks to the flexibility of choosing noise class, we are allowed to use noisy gradient descent with best fitted noises based on practical …

参考文章(51)

V. S. Borkar, S. K. Mitter, A Strong Approximation Theorem for Stochastic Recursive Algorithms Journal of Optimization Theory and Applications. ,vol. 100, pp. 499- 513 ,(1999) , 10.1023/A:1022630321574

Amos J. Storkey, Xiaocheng Shang, Zhanxing Zhu, Benedict Leimkuhler, Covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling neural information processing systems. ,vol. 28, pp. 37- 45 ,(2015)

Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson, Local Rademacher complexities Annals of Statistics. ,vol. 33, pp. 1497- 1537 ,(2005) , 10.1214/009053605000000282

Francis R. Bach, Alexandre Défossez, Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions international conference on artificial intelligence and statistics. pp. 205- 213 ,(2015)

A. N. Kolmogorov, B. W. Gnedenko, Limit Distributions for Sums of Independent Random Variables ,(1954)

Bernt Øksendal, Stochastic Differential Equations The Mathematical Gazette. ,vol. 77, pp. 65- 84 ,(1985) , 10.1007/978-3-642-14394-6_5

Carlos Guestrin, Tianqi Chen, Emily Fox, Stochastic Gradient Hamiltonian Monte Carlo international conference on machine learning. pp. 1683- 1691 ,(2014)

Francis Bach, Eric Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) neural information processing systems. ,vol. 26, pp. 773- 781 ,(2013)

Anoop Korattikara, Max Welling, Sungjin Ahn, Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring international conference on machine learning. pp. 1771- 1778 ,(2012)

10.

James Martens, New insights and perspectives on the natural gradient method arXiv: Learning. ,(2014)

On the Noisy Gradient Descent that Generalizes as SGD

来源期刊

我的账户

On the Noisy Gradient Descent that Generalizes as SGD

来源期刊

相似文章 5

Multiplicative noise and heavy tails in stochastic optimization.

Dynamic of Stochastic Gradient Descent with State-dependent Noise

Improved generalization by noise enhancement

A cost-effective method for improving and re-purposing large, pre-trained GANs by fine-tuning their class-embeddings

Smoothness Analysis of Loss Functions of Adversarial Training.

我的账户