作者: Xiaotong Shen , Yongli Zhang
DOI: 10.1002/SAM.V3:5
关键词: Model selection 、 Mathematical optimization 、 Feature selection 、 Brute-force search 、 Upper and lower bounds 、 Least-angle regression 、 Clustering high-dimensional data 、 Computer science 、 Sample size determination 、 Data mining 、 Bayesian information criterion
摘要: For high-dimensional regression, the number of predictors may greatly exceed sample size but only a small fraction them are related to response. Therefore, variable selection is inevitable, where consistent model primary concern. However, conventional criteria like Bayesian information criterion (BIC) be inadequate due their nonadaptivity space and infeasibility exhaustive search. To address these two issues, we establish probability lower bound selecting smallest true by an criterion, based on which propose what call RICc, adapts space. Furthermore, develop computationally feasible method combining computational power least angle regression (LAR) with that RICc. Both theoretical simulation studies show this identifies converging one if selected LAR. The proposed applied real data from market outperforms backward in terms price forecasting accuracy. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis Data Mining 3: 350-358,