Natural gradient works efficiently in learning

作者: Shun-ichi Amari

DOI: 10.1162/089976698300017746

关键词:

摘要: When a parameter space has certain underlying structure, the ordinary gradient of function does not represent its steepest direction, but natural does. Information geometry is used for calculating gradients in perceptrons, matrices (for blind source separation), and linear dynamical systems deconvolution). The behavior online learning analyzed proved to be Fisher efficient, implying that it asymptotically same performance as optimal batch estimation parameters. This suggests plateau phenomenon, which appears backpropagation algorithm multilayer might disappear or so serious when used. An adaptive method updating rate proposed analyzed.

参考文章(76)
Sompolinsky H, Barkai N, Seung H S, On-line Learning of Dichotomies: Algorithms and Learning Curves. neural information processing systems. ,(1995)
C. Radhakrishna Rao, Information and the Accuracy Attainable in the Estimation of Statistical Parameters Bull Calcutta. Math. Soc.. ,vol. 37, pp. 235- 247 ,(1992) , 10.1007/978-1-4612-0919-5_16
S. Amari, M. Kawanabe, Estimating Functions in Semiparametric Statistical Models Institute of Mathematical Statistics. pp. 65- 82 ,(1997) , 10.1214/LNMS/1215455039
E. Oja, J. Karhunen, Signal Separation by Nonlinear Hebbian Learning IEEE Press. ,(1995)
L-Q Zhang, Andrzej Cichocki, Shun-ichi Amari, None, Multichannel blind deconvolution of non-minimum phase systems using information backpropagation international conference on neural information processing. ,vol. 1, pp. 210- 216 ,(1999) , 10.1109/ICONIP.1999.843988
A. Cichocki, L. Zhang, Adaptive multichannel blind deconvolution using state-space models Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics. SPW-HOS '99. pp. 296- 299 ,(1999) , 10.1109/HOST.1999.778746
Shun-ichi Amari, Learning and statistical inference The handbook of brain theory and neural networks. pp. 522- 526 ,(1998)
S.C. Douglas, A. Cichocki, S.-I. Amari, Multichannel blind separation and deconvolution of sources with arbitrary distributions Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop. pp. 436- 445 ,(1997) , 10.1109/NNSP.1997.622425
Bin-Chul Ihm, Dong-Jo Park, Acceleration of learning speed in neural networks by reducing weight oscillations international joint conference on neural network. ,vol. 3, pp. 1729- 1732 ,(1999) , 10.1109/IJCNN.1999.832637