Stochastic Gradient Hamiltonian Monte Carlo

作者: Carlos Guestrin , Tianqi Chen , Emily Fox

DOI:

关键词: Matrix decompositionHamiltonian (control theory)AlgorithmHybrid Monte CarloStochastic approximationMathematical optimizationArtificial neural networkComputationLangevin dynamicsState spaceMathematics

摘要: Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity such has grown significantly recent years. However, limitation HMC is required gradient computation simulation dynamical system--such infeasible problems involving large sample size or streaming data. Instead, we must rely on noisy estimate computed from subset In this paper, explore properties stochastic approach. Surprisingly, natural implementation approximation can be arbitrarily bad. To address problem introduce variant that uses second-order Langevin dynamics friction term counteracts effects gradient, maintaining desired target distribution as invariant distribution. Results simulated data validate our theory. We also an application to classification task using neural networks and online Bayesian matrix factorization.

参考文章(23)
Ilya Sutskever, Geoffrey Hinton, James Martens, George Dahl, On the importance of initialization and momentum in deep learning international conference on machine learning. pp. 1139- 1147 ,(2013)
Mark Girolami, Ben Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods Journal of The Royal Statistical Society Series B-statistical Methodology. ,vol. 73, pp. 123- 214 ,(2011) , 10.1111/J.1467-9868.2010.00765.X
David A. Levin, Elizabeth L. Wilmer, Y. Peres, Y. Peres, Y. Peres, Markov Chains and Mixing Times ,(2008)
Ziyu Wang, Shakir Mohamed, De Freitas Nando, Adaptive Hamiltonian and Riemann Manifold Monte Carlo international conference on machine learning. pp. 1462- 1470 ,(2013)
Chris Holmes, Arnaud Doucet, R mi Bardenet, Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach international conference on machine learning. ,vol. 1, pp. 405- 413 ,(2014)
Radford M. Neal, MCMC Using Hamiltonian Dynamics arXiv: Computation. pp. 139- 188 ,(2011) , 10.1201/B10905-10
A. Nemirovski, A. Juditsky, G. Lan, A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming SIAM Journal on Optimization. ,vol. 19, pp. 1574- 1609 ,(2009) , 10.1137/070704277
Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586
Alan M. Horowitz, A generalized guided Monte Carlo algorithm Physics Letters B. ,vol. 268, pp. 247- 252 ,(1991) , 10.1016/0370-2693(91)90812-5
Ming Chen Wang, G. E. Uhlenbeck, On the Theory of the Brownian Motion II Reviews of Modern Physics. ,vol. 17, pp. 323- 342 ,(1945) , 10.1103/REVMODPHYS.17.323