Fast exact multiplication by the Hessian

DOI: 10.1162/NECO.1994.6.1.147

关键词:

摘要: Just storing the Hessian H (the matrix of second derivatives δ2E/δwiδ wj error E with respect to each pair weights) a large neural network is difficult. Since common use like compute its product various vectors, we derive technique that directly calculates Hv, where v an arbitrary vector. To calculate first define differential operator Rv{f(w)} = (δ/δr)f(w + rv)|r=0, note Rv{∇w} Hv and Rv{w} v, then apply Rv{·} equations used ∇w. The result exact numerically stable procedure for computing which takes about as much computation, local, gradient evaluation. We one pass calculation algorithm (backpropagation), relaxation (recurrent backpropagation), two stochastic algorithms (Boltzmann machines weight perturbation). Finally, show this can be at heart many iterative techniques properties H, obviating any need full Hessian.

参考文章(37)

Yann Lecun, S. Becker, Improving the convergence of back-propagation learning with second-order methods Morgan Kaufmann. pp. 29- 37 ,(1989)

F.J. Von Zuben, M.L. de Andrade Netto, Second-order training for recurrent neural networks without teacher-forcing international conference on networks. ,vol. 2, pp. 801- 806 ,(1995) , 10.1109/ICNN.1995.487520

John Skilling, The Eigenvalues of Mega-dimensional Matrices Springer Netherlands. pp. 455- 466 ,(1989) , 10.1007/978-94-015-7860-8_48

P. Werbos, Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences Ph. D. dissertation, Harvard University. ,(1974)

Alan H. Barr, Kurt W. Fleischer, Douglas Kerns, David B. Kirch, Analog VLSI Implementation of Gradient Descent neural information processing systems. pp. 789- 796 ,(1992)

Luis B. Almeida, A learning rule for asynchronous perceptrons with feedback in a combinatorial environment Artificial neural networks. pp. 102- 111 ,(1990)

Martin F. Møller, A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning DAIMI Report Series. ,vol. 19, ,(1990) , 10.7146/DPB.V19I339.6570

Martin F. Møller, Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time DAIMI Report Series. ,vol. 22, ,(1993) , 10.7146/DPB.V22I432.6748

Fernando J. Pineda, Generalization of back-propagation to recurrent neural networks. Physical Review Letters. ,vol. 59, pp. 2229- 2232 ,(1987) , 10.1103/PHYSREVLETT.59.2229

10.

Peter M. Williams, Bayesian regularization and pruning using a Laplace prior Neural Computation. ,vol. 7, pp. 117- 143 ,(1995) , 10.1162/NECO.1995.7.1.117

Fast exact multiplication by the Hessian

来源期刊

我的账户

Fast exact multiplication by the Hessian

来源期刊

相似文章 10

我的账户