Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

作者: Anoop Korattikara , Max Welling , Sungjin Ahn

DOI:

关键词:

摘要: In this paper we address the following question: "Can approximately sample from a Bayesian posterior distribution if are only allowed to touch small mini-batch of data-items for every generate?". An algorithm based on Langevin equation with stochastic gradients (SGLD) was previously proposed solve this, but its mixing rate slow. By leveraging Central Limit Theorem, extend SGLD so that at high rates it will normal approximation posterior, while slow mimic behavior pre-conditioner matrix. As bonus, is reminiscent Fisher scoring (with gradients) and as such an efficient optimizer during burn-in.

参考文章(11)
Lucien M Le Cam, Asymptotic methods in statistical theory Springer-Verlag New York, Inc.. ,(1986)
Nicol N. Schraudolph, Simon Günter, Jin Yu, A stochastic quasi-Newton method for online convex optimization international conference on artificial intelligence and statistics. pp. 436- 443 ,(2007)
Hugo Larochelle, Yoshua Bengio, Classification using discriminative restricted Boltzmann machines Proceedings of the 25th international conference on Machine learning - ICML '08. pp. 536- 543 ,(2008) , 10.1145/1390156.1390224
William Andrew Scott, Maximum likelihood estimation using the empirical fisher information matrix Journal of Statistical Computation and Simulation. ,vol. 72, pp. 599- 611 ,(2002) , 10.1080/00949650213744
Vivek S. Borkar, Stochastic approximation with two time scales Systems & Control Letters. ,vol. 29, pp. 291- 294 ,(1997) , 10.1016/S0167-6911(97)90015-3
Léon Bottou, Olivier Bousquet, The Tradeoffs of Large Scale Learning neural information processing systems. ,vol. 20, pp. 161- 168 ,(2007)
Christophe Andrieu, Johannes Thoms, A tutorial on adaptive MCMC Statistics and Computing. ,vol. 18, pp. 343- 373 ,(2008) , 10.1007/S11222-008-9110-Y