Fast Laplace approximation for sparse Bayesian spike and slab models

作者: Shandian Zhe , Yuan Qi , Yifan Yang , Syed Abbas Z. Naqvi , Jieping Ye

DOI:

关键词:

摘要: We consider the application of Bayesian spike-and-slab models in high-dimensional feature selection problems. To do so, we propose a simple yet effective fast approximate inference algorithm based on Laplace's method. exploit two efficient optimization methods, GIST [Gong et al., 2013] and L-BFGS [Nocedal, 1980], to obtain mode posterior distribution. Then an ensemble Nystrom approach calculate diagonal inverse Hessian over marginals O(knp) time, k ≪ p. Furthermore, provide theoretical analysis about estimation consistency approximation error bounds. With model weights, use quadrature integration estimate marginal posteriors probabilities indicator variables for all features, which quantify uncertainty. Our method not only maintains benefits treatment (e.g., uncertainty quantification) but also possesses computational efficiency, oracle properties frequentist methods. Simulation shows that our estimates better or comparable than alternative methods such as VB EP, with less running time. Extensive experiments large real datasets demonstrate often improves prediction accuracy automatic relevance determination, L1 type

参考文章(22)
José Miguel Hernández-Lobato, Balancing flexibility and robustness in machine learning: semi-parametric methods and sparse linear models Universidad Autónoma de Madrid. ,(2010)
James C. Bezdek, Richard J. Hathaway, Convergence of alternating optimization Neural, Parallel & Scientific Computations archive. ,vol. 11, pp. 351- 368 ,(2003) , 10.5555/964885.964886
Hemant Ishwaran, J. Sunil Rao, Spike and slab variable selection: Frequentist and Bayesian strategies Annals of Statistics. ,vol. 33, pp. 730- 773 ,(2005) , 10.1214/009053604000001147
A. Ahmed, E. P. Xing, Recovering time-varying networks of dependencies in social and biological studies. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 106, pp. 11878- 11883 ,(2009) , 10.1073/PNAS.0901910106
Andreas Rosenwald, George Wright, Wing C. Chan, Joseph M. Connors, Elias Campo, Richard I. Fisher, Randy D. Gascoyne, H. Konrad Muller-Hermelink, Erlend B. Smeland, Jena M. Giltnane, Elaine M. Hurt, Hong Zhao, Lauren Averett, Liming Yang, Wyndham H. Wilson, Elaine S. Jaffe, Richard Simon, Richard D. Klausner, John Powell, Patricia L. Duffey, Dan L. Longo, Timothy C. Greiner, Dennis D. Weisenburger, Warren G. Sanger, Bhavana J. Dave, James C. Lynch, Julie Vose, James O. Armitage, Emilio Montserrat, Armando López-Guillermo, Thomas M. Grogan, Thomas P. Miller, Michel LeBlanc, German Ott, Stein Kvaloy, Jan Delabie, Harald Holte, Peter Krajci, Trond Stokke, Louis M. Staudt, The Use of Molecular Profiling to Predict Survival after Chemotherapy for Diffuse Large-B-Cell Lymphoma The New England Journal of Medicine. ,vol. 346, pp. 1937- 1947 ,(2002) , 10.1056/NEJMOA012914
T. J. Mitchell, J. J. Beauchamp, Bayesian Variable Selection in Linear Regression Journal of the American Statistical Association. ,vol. 83, pp. 1023- 1032 ,(1988) , 10.1080/01621459.1988.10478694
Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Alberto Suárez, Expectation Propagation for microarray data classification Pattern Recognition Letters. ,vol. 31, pp. 1618- 1626 ,(2010) , 10.1016/J.PATREC.2010.05.007
Arthur E. Hoerl, Robert W. Kennard, Ridge regression: biased estimation for nonorthogonal problems Technometrics. ,vol. 42, pp. 80- 86 ,(2000) , 10.2307/1271436
Jorge Nocedal, Updating Quasi-Newton Matrices With Limited Storage Mathematics of Computation. ,vol. 35, pp. 773- 782 ,(1980) , 10.1090/S0025-5718-1980-0572855-7