作者: Shun-ichi Amari
DOI: 10.1162/089976698300017746
关键词:
摘要: When a parameter space has certain underlying structure, the ordinary gradient of function does not represent its steepest direction, but natural does. Information geometry is used for calculating gradients in perceptrons, matrices (for blind source separation), and linear dynamical systems deconvolution). The behavior online learning analyzed proved to be Fisher efficient, implying that it asymptotically same performance as optimal batch estimation parameters. This suggests plateau phenomenon, which appears backpropagation algorithm multilayer might disappear or so serious when used. An adaptive method updating rate proposed analyzed.