A new variable selection approach using Random Forests

作者: A. Hapfelmeier , K. Ulm

DOI: 10.1016/J.CSDA.2012.09.020

关键词: Control (linguistics)Random forestMathematicsFeature selectionWord error rateMachine learningArtificial intelligenceRegressionMultiple comparisons problemPermutation

摘要: Random Forests are frequently applied as they achieve a high prediction accuracy and have the ability to identify informative variables. Several approaches for variable selection been proposed combine intensify these qualities. An extensive review of corresponding literature led development new approach that is based on theoretical framework permutation tests meets important statistical properties. A comparison another eight popular methods in three simulation studies four real data applications indicated that: can also be used control test-wise family-wise error rate, provides higher power distinguish relevant from irrelevant variables leads models which located among very best performing ones. In addition, it equally applicable regression classification problems.

参考文章(61)
Vladimir Svetnik, Andy Liaw, Christopher Tong, Ting Wang, Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules multiple classifier systems. pp. 334- 343 ,(2004) , 10.1007/978-3-540-25966-4_33
Alin Dobra, Johannes Gehrke, Bias Correction in Classification Tree Construction international conference on machine learning. pp. 90- 97 ,(2001)
Wilson Toussile, Robin Genuer, Isabelle Morlais, Gametocytes infectiousness to mosquitoes: variable selection using random forests, and zero inflated models arXiv: Statistics Theory. pp. 23- ,(2011)
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
W. N. Venables, B. D. Ripley, Modern Applied Statistics with S Springer. ,(2010) , 10.1007/978-0-387-21706-2
Ramón Díaz-Uriarte, Sara Alvarez de Andrés, Gene selection and classification of microarray data using random forest BMC Bioinformatics. ,vol. 7, pp. 3- 3 ,(2006) , 10.1186/1471-2105-7-3
L Brooke Hayward, Jonathan Segal, Paul Van Eerdewegh, Kathryn L Lunetta, Screening large-scale association study data: exploiting interactions using random forests BMC Genetics. ,vol. 5, pp. 32- 32 ,(2004) , 10.1186/1471-2156-5-32
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)