作者: Takéhiko Nakama
DOI: 10.1016/J.NEUCOM.2009.05.017
关键词:
摘要: In this study, we theoretically analyze two essential training schemes for gradient descent learning in neural networks: batch and on-line training. The convergence properties of the applied to quadratic loss functions are analytically investigated. We quantify each scheme optimal weight using absolute value expected difference (Measure 1) squared 2) between computed by scheme. Although has several advantages over with respect first measure, it does not converge second measure if variance per-instance remains constant. However, decays exponentially, then converges Measure 2. Our analysis reveals exact degrees which set size, gradient, rate affect