摘要: Just storing the Hessian H (the matrix of second derivatives δ2E/δwiδ wj error E with respect to each pair weights) a large neural network is difficult. Since common use like compute its product various vectors, we derive technique that directly calculates Hv, where v an arbitrary vector. To calculate first define differential operator Rv{f(w)} = (δ/δr)f(w + rv)|r=0, note Rv{∇w} Hv and Rv{w} v, then apply Rv{·} equations used ∇w. The result exact numerically stable procedure for computing which takes about as much computation, local, gradient evaluation. We one pass calculation algorithm (backpropagation), relaxation (recurrent backpropagation), two stochastic algorithms (Boltzmann machines weight perturbation). Finally, show this can be at heart many iterative techniques properties H, obviating any need full Hessian.