Ultrahigh Dimensional Feature Selection: Beyond The Linear Model

作者: Jianqing Fan , Richard Samworth , Yichao Wu

DOI:

关键词:

摘要: Variable selection in high-dimensional space characterizes many contemporary problems scientific discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking (Fan & Lv, 2008) or feature using a two-sample t-test classification (Tibshirani et al., 2003). Within the context of linear model, Fan Lv (2008) showed that this simple possesses sure screening property under certain conditions its revision, called iteratively independent (ISIS), is needed when features marginally unrelated but jointly related to response variable. In paper, we extend ISIS, without explicit definition residuals, general pseudo-likelihood framework, which includes generalized models as special case. Even least-squares setting, new method improves ISIS by allowing deletion iterative process. Our technique allows us select important where popularly used t-method fails. A introduced reduce false rate stage. Several simulated two real data presented illustrate methodology.

参考文章(54)
V. N. Vapnik, The Nature of Statistical Learning Theory. ,(1995)
Isabelle Guyon, Steve Gunn, Lotfi A. Zadeh, Masoud Nikravesh, Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) Springer-Verlag New York, Inc.. ,(2006)
Huan Liu, Zheng Zhao, Searching for interacting features international joint conference on artificial intelligence. pp. 1156- 1161 ,(2007)
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Peter McCullagh, John Ashworth Nelder, Generalized Linear Models ,(1983)
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
Mark Andrew Hall, Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning international conference on machine learning. pp. 359- 366 ,(2000)
Huan Liu, Lei Yu, Feature selection for high-dimensional data: a fast correlation-based filter solution international conference on machine learning. pp. 856- 863 ,(2003)
Javed Khan, Jun S Wei, Markus Ringner, Lao H Saal, Marc Ladanyi, Frank Westermann, Frank Berthold, Manfred Schwab, Cristina R Antonescu, Carsten Peterson, Paul S Meltzer, None, Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks Nature Medicine. ,vol. 7, pp. 673- 679 ,(2001) , 10.1038/89044