Dynamic of Stochastic Gradient Descent with State-dependent Noise

作者: Tie-Yan Liu , Qi Meng , Shiqi Gong , Zhi-Ming Ma , Wei Chen

DOI:

关键词:

摘要: … with state-dependent noise. Specifically, we show that the covariance of the noise of SGD in … Inspired by our theory, we propose to add additional state-dependent noise into (large-batch…

参考文章(47)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification international conference on computer vision. pp. 1026- 1034 ,(2015) , 10.1109/ICCV.2015.123
Gérard Ben Arous, Anna Choromanska, Yann LeCun, Mikael Henaff, Michael Mathieu, The Loss Surfaces of Multilayer Networks international conference on artificial intelligence and statistics. ,vol. 38, pp. 192- 204 ,(2015)
Ran Guo, Jiulin Du, Are power-law distributions an equilibrium distribution or a stationary nonequilibrium distribution? Physica A-statistical Mechanics and Its Applications. ,vol. 406, pp. 281- 286 ,(2014) , 10.1016/J.PHYSA.2014.03.056
Yanjun Zhou, Jiulin Du, Kramers escape rate in overdamped systems with the power-law distribution Physica A-statistical Mechanics and Its Applications. ,vol. 402, pp. 299- 305 ,(2014) , 10.1016/J.PHYSA.2014.01.065
David A. McAllester, PAC-Bayesian model averaging conference on learning theory. pp. 164- 170 ,(1999) , 10.1145/307400.307435
N. G. Van Kampen, William P. Reinhardt, Stochastic processes in physics and chemistry ,(1981)
Alexander Rakhlin, Ohad Shamir, Karthik Sridharan, Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization international conference on machine learning. pp. 1571- 1578 ,(2012)
Léon Bottou, Olivier Bousquet, The Tradeoffs of Large Scale Learning neural information processing systems. ,vol. 20, pp. 161- 168 ,(2007)
Tom Schaul, Yann LeCun, Sixin Zhang, No more pesky learning rates international conference on machine learning. pp. 343- 351 ,(2013)