High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

作者: Eric P. Xing , Wittawat Jitkrittum , Masashi Sugiyama , Leonid Sigal , Makoto Yamada

DOI: 10.1162/NECO_A_00537

关键词: Kernel (statistics)Pattern recognition (psychology)Feature (computer vision)Lasso (statistics)Mutual informationComputer scienceFeature selectionDependency (UML)Pattern recognitionIndependence (probability theory)Artificial intelligence

摘要: The goal of supervised feature selection is to find a subset input features that are responsible for predicting output values. least absolute shrinkage and operator (Lasso) allows computationally efficient based on linear dependency between In this letter, we consider feature-wise kernelized Lasso capturing nonlinear input-output dependency. We first show with particular choices kernel functions, nonredundant strong statistical dependence values can be found in terms kernel-based independence measures such as the Hilbert-Schmidt criterion. then globally optimal solution efficiently computed; makes approach scalable high-dimensional problems. effectiveness proposed method demonstrated through experiments classification regression thousands features.

参考文章(49)
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
Eric P. Xing, Richard M. Karp, Michael I. Jordan, Feature selection for high-dimensional genomic microarray data international conference on machine learning. pp. 601- 608 ,(2001)
Huan Liu, Lei Wang, Zheng Zhao, Efficient spectral feature selection with minimum redundancy national conference on artificial intelligence. pp. 673- 678 ,(2010)
Arthur Gretton, Olivier Bousquet, Alex Smola, Bernhard Schölkopf, Measuring Statistical Dependence with Hilbert-Schmidt Norms Lecture Notes in Computer Science. pp. 63- 77 ,(2005) , 10.1007/11564089_7
Francis R. Bach, Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning neural information processing systems. ,vol. 21, pp. 105- 112 ,(2008)
Ingo Steinwart, On the influence of the kernel on the consistency of support vector machines Journal of Machine Learning Research. ,vol. 2, pp. 67- 93 ,(2002) , 10.1162/153244302760185252
Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh, Algorithms for learning kernels based on centered alignment Journal of Machine Learning Research. ,vol. 13, pp. 795- 828 ,(2012)
Wotao Yin, Stanley Osher, Donald Goldfarb, Jerome Darbon, Bregman Iterative Algorithms for $\ell_1$-Minimization with Applications to Compressed Sensing Siam Journal on Imaging Sciences. ,vol. 1, pp. 143- 168 ,(2008) , 10.1137/070703983
Patrick L. Combettes, Valérie R. Wajs, SIGNAL RECOVERY BY PROXIMAL FORWARD-BACKWARD SPLITTING ∗ Multiscale Modeling & Simulation. ,vol. 4, pp. 1168- 1200 ,(2005) , 10.1137/050626090
Sebastien Da Veiga, Global sensitivity analysis with dependence measures Journal of Statistical Computation and Simulation. ,vol. 85, pp. 1283- 1305 ,(2015) , 10.1080/00949655.2014.945932