Theoretical analysis of batch and on-line training for gradient descent learning in neural networks

作者: Takéhiko Nakama

DOI: 10.1016/J.NEUCOM.2009.05.017

关键词:

摘要: In this study, we theoretically analyze two essential training schemes for gradient descent learning in neural networks: batch and on-line training. The convergence properties of the applied to quadratic loss functions are analytically investigated. We quantify each scheme optimal weight using absolute value expected difference (Measure 1) squared 2) between computed by scheme. Although has several advantages over with respect first measure, it does not converge second measure if variance per-instance remains constant. However, decays exponentially, then converges Measure 2. Our analysis reveals exact degrees which set size, gradient, rate affect

参考文章(24)
JC Pricipe, NR Euliano, WC Lefebvre, Neural and adaptive systems ,(2000)
Yann Lecun, S. Becker, Improving the convergence of back-propagation learning with second-order methods Morgan Kaufmann. pp. 29- 37 ,(1989)
Laurene V. Fausett, Fundamentals of neural networks ,(1993)
Peter Bartlett, Martin M. Anthony, Learning in Neural Networks: Theoretical Foundations Cambridge University Press. ,(1999)
Sholom M. Weiss, Computer systems that learn ,(1990)
Martin Anthony, Peter L Bartlett, Peter L Bartlett, Neural Network Learning: Theoretical Foundations ,(1999)
Christopher M. Bishop, Neural networks for pattern recognition ,(1995)
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)