A Fast Hybrid Algorithm for Large-Scale l 1 -Regularized Logistic Regression

作者: Paul Sajda , Wotao Yin , Jianing Shi , Stanley Osher

DOI:

关键词:

摘要: l1-regularized logistic regression, also known as sparse is widely used in machine learning, computer vision, data mining, bioinformatics and neural signal processing. The use of l1 regularization attributes attractive properties to the classifier, such feature selection, robustness noise, a result, classifier generality context supervised learning. When regression problem has large-scale high dimensions, it computationally expensive minimize non-differentiable l1-norm objective function. Motivated by recent work (Koh et al., 2007; Hale 2008), we propose novel hybrid algorithm based on combining two types optimization iterations: one being very fast memory friendly while other slower but more accurate. Called iterative shrinkage (HIS), resulting comprised fixed point continuation phase an interior phase. first completely efficient operations matrix-vector multiplications, second truncated Newton's method. Furthermore, show that various techniques, including line search continuation, can significantly accelerate convergence. global convergence at geometric rate (a Q-linear terminology). We present numerical comparison with several existing algorithms, analysis using benchmark from UCI learning repository, our most without loss accuracy.

参考文章(60)
David L. Donoho, Iain M. Johnstone, Gérard Kerkyacharian, Dominique Picard, Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 57, pp. 301- 337 ,(1995) , 10.1111/J.2517-6161.1995.TB02032.X
Mark Schmidt, Glenn Fung, Rómer Rosales, Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches european conference on machine learning. pp. 286- 297 ,(2007) , 10.1007/978-3-540-74958-5_28
Andrew Y. Ng, On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples international conference on machine learning. pp. 404- 412 ,(1998)
Vladimir Naumovich Vapnik, Estimation of Dependences Based on Empirical Data ,(2010)
Joshua T. Goodman, Exponential priors for maximum entropy models north american chapter of the association for computational linguistics. pp. 305- 312 ,(2005)
David Madigan, Alexander Genkin, David D Lewis, Dmitriy Fradkin, Bayesian Multinomial Logistic Regression for Author Identification BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING: 25th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering. ,vol. 803, pp. 509- 516 ,(2005) , 10.1063/1.2149832
Boris T Poljak, Introduction to optimization Optimization Software, Publications Division. ,(1987)
Christopher M. Bishop, Pattern Recognition and Machine Learning ,(2006)
James Theiler, Simon Perkins, Online feature selection using grafting international conference on machine learning. pp. 592- 599 ,(2003)
Hui Zou, Trevor Hastie, Robert Tibshirani, Sparse Principal Component Analysis Journal of Computational and Graphical Statistics. ,vol. 15, pp. 265- 286 ,(2006) , 10.1198/106186006X113430