On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications.

作者: Eyad Elyan , Khaled Fawagreh , Mohamed Medhat Gaber

DOI:

关键词:

摘要: Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other techniques, it has proved its accuracy and superiority. Many researchers, however, believe there still room for enhancing improving performance accuracy. This explains why, the past decade, have been many extensions of RF where each extension employed variety techniques strategies to improve certain aspect(s) RF. Since proven empiricallthat ensembles tend yield better results when significant diversity among constituent models, objective this paper twofold. First, investigates how data clustering (a well known technique) can be applied identify groups similar decision trees in order eliminate redundant selecting representative from group (cluster). Second, these likely diverse representatives are then used produce termed CLUB-DRF much smaller size than RF, yet performs at least as good mostly exhibits higher terms The latter refers called pruning. Experimental on 15 real datasets UCI repository prove superiority our proposed traditional Most experiments achieved 95% or above pruning level while retaining outperforming

参考文章(49)
Ron Kohavi, David Wolpert, Bias plus variance decomposition for zero-one loss functions international conference on machine learning. pp. 275- 283 ,(1996)
Padhraic Smyth, David Wolpert, Linearly Combining Density Estimators via Stacking Machine Learning. ,vol. 36, pp. 59- 83 ,(1999) , 10.1023/A:1007511322260
Grigorios Tsoumakas, Ioannis Partalas, Ioannis Vlahavas, An Ensemble Pruning Primer Applications of Supervised and Unsupervised Ensemble Methods. pp. 1- 13 ,(2009) , 10.1007/978-3-642-03999-7_1
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Ron Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection international joint conference on artificial intelligence. ,vol. 2, pp. 1137- 1143 ,(1995)
Jian Li, Ke Yi, Qin Zhang, Clustering with Diversity Automata, Languages and Programming. pp. 188- 200 ,(2010) , 10.1007/978-3-642-14165-2_17
Tin Kam Ho, Random decision forests international conference on document analysis and recognition. ,vol. 1, pp. 278- 282 ,(1995) , 10.1109/ICDAR.1995.598994