Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory

作者: Lawrence Carin , Changyou Chen , Ruiyi Zhang , Jianyi Zhang

DOI:

关键词:

摘要: Particle-optimization-based sampling (POS) is a recently developed effective technique that interactively updates set of particles. A representative algorithm the Stein variational gradient descent (SVGD). We prove, under certain conditions, SVGD experiences theoretical pitfall, {\it i.e.}, particles tend to collapse. As remedy, we generalize POS stochastic setting by injecting random noise into particle updates, thus yielding particle-optimization (SPOS). Notably, for first time, develop {\em non-asymptotic convergence theory} SPOS framework (related SVGD), characterizing in terms 1-Wasserstein distance w.r.t.\! numbers and iterations. Somewhat surprisingly, with same number (not too large) each particle, our theory suggests adopting more does not necessarily lead better approximation target distribution, due limited computational budget numerical errors. This phenomenon also observed verified via an experiment on synthetic data. Extensive experimental results verify demonstrate effectiveness proposed framework.

参考文章(45)
Yee Whye Teh, Sebastian J. Vollmer, Alexandre H. Thiery, Consistency and fluctuations for stochastic gradient Langevin dynamics Journal of Machine Learning Research. ,vol. 17, pp. 193- 225 ,(2016) , 10.5555/2946645.2946652
Shakir Mohamed, Danilo Jimenez Rezende, Variational Inference with Normalizing Flows arXiv: Machine Learning. ,(2015)
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, None, High-Dimensional Continuous Control Using Generalized Advantage Estimation arXiv: Learning. ,(2015)
Tianqi Chen, Emily B. Fox, Yi-An Ma, A Complete Recipe for Stochastic Gradient MCMC arXiv: Statistics Theory. ,(2015)
François Bolley, Cédric Villani, Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities Annales de la Faculté des Sciences de Toulouse. ,vol. 14, pp. 331- 352 ,(2005) , 10.5802/AFST.1095
Clark R. Givens, Rae Michael Shortt, A class of Wasserstein metrics for probability distributions. The Michigan Mathematical Journal. ,vol. 31, pp. 231- 240 ,(1984) , 10.1307/MMJ/1029003026
Tzuu-Shuh Chiang, Chii-Ruey Hwang, Shuenn Jyi Sheu, Diffusion for Global Optimization in $\mathbb{R}^n $ SIAM Journal on Control and Optimization. ,vol. 25, pp. 737- 753 ,(1987) , 10.1137/0325042
P. Cattiaux, A. Guillin, F. Malrieu, Probabilistic approach for granular media equations in the non uniformly convex case Probability Theory and Related Fields. ,vol. 140, pp. 19- 40 ,(2007) , 10.1007/S00440-007-0056-3
J.C. Mattingly, A.M. Stuart, D.J. Higham, Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise Stochastic Processes and their Applications. ,vol. 101, pp. 185- 232 ,(2002) , 10.1016/S0304-4149(02)00150-3
Hannes Risken, The Fokker-Planck equation ,(1984)