Mining data with random forests: current options for real-world applications

作者: Andreas Ziegler , Inke R. König

DOI: 10.1002/WIDM.1114

关键词:

摘要: Random Forests are fast, flexible, and represent a robust approach to mining high-dimensional data. They an extension of classification regression trees (CART). perform well even in the presence large number features small observations. In analogy CART, random forests can deal with continuous outcome, categorical time-to-event outcome censoring. The tree-building process implicitly allows for interaction between high correlation features. Approaches available measuring variable importance reducing Although many applications, their theoretical properties not fully understood. Recently, several articles have provided better understanding forests, we summarize these findings. We survey different versions including classification, probability estimation, estimating survival discuss consequences (1) no selection, (2) (3) combination deterministic selection forests. Finally, review backward elimination forward procedure, determination representing forest, identification important variables forest. provide brief overview areas application WIREs Data Mining Knowl Discov 2014, 4:55–63. doi: 10.1002/widm.1114 For further resources related this article, please visit website.

参考文章(33)
Yan V. Sun, Multigenic modeling of complex disease by random forests. Advances in Genetics. ,vol. 72, pp. 73- 99 ,(2010) , 10.1016/B978-0-12-380862-2.00004-7
Ramón Díaz-Uriarte, Sara Alvarez de Andrés, Gene selection and classification of microarray data using random forest BMC Bioinformatics. ,vol. 7, pp. 3- 3 ,(2006) , 10.1186/1471-2105-7-3
Adele Cutler, John R. Stevens, Random forests for microarrays. Methods in Enzymology. ,vol. 411, pp. 422- 432 ,(2006) , 10.1016/S0076-6879(06)11023-X
A. Deloncle, R. Berk, F. D’Andrea, M. Ghil, Weather Regime Prediction Using Statistical Learning Journal of the Atmospheric Sciences. ,vol. 64, pp. 1619- 1635 ,(2007) , 10.1175/JAS3918.1
Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, Torsten Hothorn, Bias in random forest variable importance measures: Illustrations, sources and a solution BMC Bioinformatics. ,vol. 8, pp. 25- 25 ,(2007) , 10.1186/1471-2105-8-25
Mousumi Banerjee, Ying Ding, Anne-Michelle Noone, Identifying representative trees from ensembles Statistics in Medicine. ,vol. 31, pp. 1601- 1616 ,(2012) , 10.1002/SIM.4492
Anne-Laure Boulesteix, Silke Janitza, Jochen Kruppa, Inke R. König, Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery. ,vol. 2, pp. 493- 507 ,(2012) , 10.1002/WIDM.1072
Minghui Wang, Heping Zhang, Search for the smallest random forest. Statistics and Its Interface. ,vol. 2, pp. 381- 381 ,(2009) , 10.4310/SII.2009.V2.N3.A11