Model selection procedure for high-dimensional data

作者: Xiaotong Shen , Yongli Zhang

DOI: 10.1002/SAM.V3:5

关键词: Model selectionMathematical optimizationFeature selectionBrute-force searchUpper and lower boundsLeast-angle regressionClustering high-dimensional dataComputer scienceSample size determinationData miningBayesian information criterion

摘要: For high-dimensional regression, the number of predictors may greatly exceed sample size but only a small fraction them are related to response. Therefore, variable selection is inevitable, where consistent model primary concern. However, conventional criteria like Bayesian information criterion (BIC) be inadequate due their nonadaptivity space and infeasibility exhaustive search. To address these two issues, we establish probability lower bound selecting smallest true by an criterion, based on which propose what call RICc, adapts space. Furthermore, develop computationally feasible method combining computational power least angle regression (LAR) with that RICc. Both theoretical simulation studies show this identifies converging one if selected LAR. The proposed applied real data from market outperforms backward in terms price forecasting accuracy. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis Data Mining 3: 350-358,

参考文章(16)
E. J. Hannan, B. G. Quinn, The Determination of the Order of an Autoregression Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 41, pp. 190- 195 ,(1979) , 10.1111/J.2517-6161.1979.TB01072.X
Dean P. Foster, Edward I. George, The risk inflation criterion for multiple regression Annals of Statistics. ,vol. 22, pp. 1947- 1975 ,(1994) , 10.1214/AOS/1176325766
Robert Tibshirani, Trevor Hastie, Hui Zou, On the “degrees of freedom” of the lasso Annals of Statistics. ,vol. 35, pp. 2173- 2192 ,(2007) , 10.1214/009053607000000127
Yongli Zhang, Model selection: A Lagrange optimization approach Journal of Statistical Planning and Inference. ,vol. 139, pp. 3142- 3159 ,(2009) , 10.1016/J.JSPI.2009.02.016
Hirotogu Akaike, Information Theory and an Extension of the Maximum Likelihood Principle international symposium on information theory. ,vol. 1, pp. 610- 624 ,(1973) , 10.1007/978-1-4612-1694-0_15
Robert Tibshirani, Trevor Hastie, Berwin A. Turlach, Bradley Efron, Jean Michel Loubes, Jean Michel Loubes, Hemant Ishwaran, Robert A. Stine, Keith Knight, Sanford Weisberg, Saharon Rosset, Saharon Rosset, Iain Johnstone, Pascal Massart, Pascal Massart, David Madigan, J. I. Zhu, Greg Ridgeway, Greg Ridgeway, Least angle regression Annals of Statistics. ,vol. 32, pp. 407- 499 ,(2004) , 10.1214/009053604000000067
Xiaotong Shen, Jianming Ye, Adaptive Model Selection Journal of the American Statistical Association. ,vol. 97, pp. 210- 221 ,(2002) , 10.1198/016214502753479356
EdwardI George, Dean P Foster, Calibration and empirical Bayes variable selection Biometrika. ,vol. 87, pp. 731- 747 ,(2000) , 10.1093/BIOMET/87.4.731
RITEI SHIBATA, An optimal selection of regression variables Biometrika. ,vol. 68, pp. 45- 54 ,(1981) , 10.1093/BIOMET/68.1.45