On the saddle point problem for non-convex optimization

作者: Yoshua Bengio , Razvan Pascanu , Yann N. Dauphin , Surya Ganguli

DOI:

关键词:

摘要: … We apply this algorithm to deep neural network training, and provide preliminary numerical … deep neuronal networks, we review evidence from that literature that saddle points also play …

参考文章(18)
Christopher K. I. Williams, Carl Edward Rasmussen, Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) The MIT Press. ,(2005)
James Martens, Deep learning via Hessian-free optimization international conference on machine learning. pp. 735- 742 ,(2010)
James J. Callahan, Advanced Calculus: A Geometric View ,(2011)
Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio, None, Theano: new features and speed improvements arXiv: Symbolic Computation. ,(2012)
Walter Murray, Newton‐Type Methods Wiley Encyclopedia of Operations Research and Management Science. ,(2011) , 10.1002/9780470400531.EORMS0569
David Saad, Sara A. Solla, On-line learning in soft committee machines Physical Review E. ,vol. 52, pp. 4225- 4243 ,(1995) , 10.1103/PHYSREVE.52.4225
Alan J. Bray, David S. Dean, Statistics of critical points of Gaussian fields on large-dimensional spaces. Physical Review Letters. ,vol. 98, pp. 150201- ,(2007) , 10.1103/PHYSREVLETT.98.150201
Magnus Rattray, David Saad, Shun-ichi Amari, Natural gradient descent for on-line learning Physical Review Letters. ,vol. 81, pp. 5461- 5464 ,(1998) , 10.1103/PHYSREVLETT.81.5461
Masato Inoue, Hyeyoung Park, Masato Okada, On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units –Steepest Gradient Descent and Natural Gradient Descent– Journal of the Physical Society of Japan. ,vol. 72, pp. 805- 810 ,(2003) , 10.1143/JPSJ.72.805