Logistic regression with weight grouping priors

作者: M. Korzeń , S. Jaroszewicz , P. Klęsk

DOI: 10.1016/J.CSDA.2013.03.013

关键词: GeneralizationLaplace operatorApplied mathematicsElastic net regularizationLogistic regressionFeature selectionHyperparameterMathematicsStatisticsPrior probabilityGaussian

摘要: A generalization of the commonly used Maximum Likelihood based learning algorithm for logistic regression model is considered. It well known that using Laplace prior (L^1 penalty) on coefficients leads to a variable selection effect, when most vanish. argued not always desirable; it often better group correlated variables together and assign equal weights them. Two new kinds priori distributions over are investigated: Gaussian Extremal Mixture (GEM) Laplacian (LEM) which enforce grouping in manner analogous L^1 L^2 regularization. An efficient presented, simultaneously finds hyperparameters those priors. Examples shown experimental part where proposed outperform Gauss priors as other methods take coefficient into account, such elastic net. Theoretical results parameter shrinkage sample complexity also included.

参考文章(27)
Martin Anthony, Peter L Bartlett, Peter L Bartlett, Neural Network Learning: Theoretical Foundations ,(1999)
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Sergey Bakin, Adaptive regression and model selection in data mining problems The Australian National University. ,(1999) , 10.25911/5D78DB4C25DBB
Abdallah Mkhadri, Mohamed Ouhourane, An extended variable inclusion and shrinkage algorithm for correlated variables Computational Statistics & Data Analysis. ,vol. 57, pp. 631- 644 ,(2013) , 10.1016/J.CSDA.2012.07.023
Hui Zou, Trevor Hastie, Addendum: Regularization and variable selection via the elastic net Journal of The Royal Statistical Society Series B-statistical Methodology. ,vol. 67, pp. 768- 768 ,(2005) , 10.1111/J.1467-9868.2005.00527.X
Andrew Y. Ng, Feature selection, L1 vs. L2 regularization, and rotational invariance Twenty-first international conference on Machine learning - ICML '04. pp. 78- ,(2004) , 10.1145/1015330.1015435
Peter M. Williams, Bayesian regularization and pruning using a Laplace prior Neural Computation. ,vol. 7, pp. 117- 143 ,(1995) , 10.1162/NECO.1995.7.1.117
J. Tabor, P. Spurek, Cross-entropy clustering Pattern Recognition. ,vol. 47, pp. 3046- 3059 ,(2014) , 10.1016/J.PATCOG.2014.03.006
Robert Tibshirani, Trevor Hastie, Berwin A. Turlach, Bradley Efron, Jean Michel Loubes, Jean Michel Loubes, Hemant Ishwaran, Robert A. Stine, Keith Knight, Sanford Weisberg, Saharon Rosset, Saharon Rosset, Iain Johnstone, Pascal Massart, Pascal Massart, David Madigan, J. I. Zhu, Greg Ridgeway, Greg Ridgeway, Least angle regression Annals of Statistics. ,vol. 32, pp. 407- 499 ,(2004) , 10.1214/009053604000000067