Fast exact multiplication by the Hessian

作者: Barak A. Pearlmutter

DOI: 10.1162/NECO.1994.6.1.147

关键词:

摘要: Just storing the Hessian H (the matrix of second derivatives δ2E/δwiδ wj error E with respect to each pair weights) a large neural network is difficult. Since common use like compute its product various vectors, we derive technique that directly calculates Hv, where v an arbitrary vector. To calculate first define differential operator Rv{f(w)} = (δ/δr)f(w + rv)|r=0, note Rv{∇w} Hv and Rv{w} v, then apply Rv{·} equations used ∇w. The result exact numerically stable procedure for computing which takes about as much computation, local, gradient evaluation. We one pass calculation algorithm (backpropagation), relaxation (recurrent backpropagation), two stochastic algorithms (Boltzmann machines weight perturbation). Finally, show this can be at heart many iterative techniques properties H, obviating any need full Hessian.

参考文章(37)
Yann Lecun, S. Becker, Improving the convergence of back-propagation learning with second-order methods Morgan Kaufmann. pp. 29- 37 ,(1989)
F.J. Von Zuben, M.L. de Andrade Netto, Second-order training for recurrent neural networks without teacher-forcing international conference on networks. ,vol. 2, pp. 801- 806 ,(1995) , 10.1109/ICNN.1995.487520
John Skilling, The Eigenvalues of Mega-dimensional Matrices Springer Netherlands. pp. 455- 466 ,(1989) , 10.1007/978-94-015-7860-8_48
P. Werbos, Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences Ph. D. dissertation, Harvard University. ,(1974)
Alan H. Barr, Kurt W. Fleischer, Douglas Kerns, David B. Kirch, Analog VLSI Implementation of Gradient Descent neural information processing systems. pp. 789- 796 ,(1992)
Luis B. Almeida, A learning rule for asynchronous perceptrons with feedback in a combinatorial environment Artificial neural networks. pp. 102- 111 ,(1990)
Martin F. Møller, A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning DAIMI Report Series. ,vol. 19, ,(1990) , 10.7146/DPB.V19I339.6570
Fernando J. Pineda, Generalization of back-propagation to recurrent neural networks. Physical Review Letters. ,vol. 59, pp. 2229- 2232 ,(1987) , 10.1103/PHYSREVLETT.59.2229
Peter M. Williams, Bayesian regularization and pruning using a Laplace prior Neural Computation. ,vol. 7, pp. 117- 143 ,(1995) , 10.1162/NECO.1995.7.1.117