On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications.

作者： Eyad Elyan , Khaled Fawagreh , Mohamed Medhat Gaber

DOI:

关键词:

摘要: Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other techniques, it has proved its accuracy and superiority. Many researchers, however, believe there still room for enhancing improving performance accuracy. This explains why, the past decade, have been many extensions of RF where each extension employed variety techniques strategies to improve certain aspect(s) RF. Since proven empiricallthat ensembles tend yield better results when significant diversity among constituent models, objective this paper twofold. First, investigates how data clustering (a well known technique) can be applied identify groups similar decision trees in order eliminate redundant selecting representative from group (cluster). Second, these likely diverse representatives are then used produce termed CLUB-DRF much smaller size than RF, yet performs at least as good mostly exhibits higher terms The latter refers called pruning. Experimental on 15 real datasets UCI repository prove superiority our proposed traditional Most experiments achieved 95% or above pruning level while retaining outperforming

uni-trier.de PDF 下载加速

arxiv.org PDF 下载加速

参考文章(49)

Graham J. Williams, Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery ,(2011)

Malay K. Pakhira, A Modified k-means Algorithm to Avoid Empty Clusters ,(2009)

Ron Kohavi, David Wolpert, Bias plus variance decomposition for zero-one loss functions international conference on machine learning. pp. 275- 283 ,(1996)

Padhraic Smyth, David Wolpert, Linearly Combining Density Estimators via Stacking Machine Learning. ,vol. 36, pp. 59- 83 ,(1999) , 10.1023/A:1007511322260

Grigorios Tsoumakas, Ioannis Partalas, Ioannis Vlahavas, An Ensemble Pruning Primer Applications of Supervised and Unsupervised Ensemble Methods. pp. 1- 13 ,(2009) , 10.1007/978-3-642-03999-7_1

Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)

Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)

Ron Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection international joint conference on artificial intelligence. ,vol. 2, pp. 1137- 1143 ,(1995)

Jian Li, Ke Yi, Qin Zhang, Clustering with Diversity Automata, Languages and Programming. pp. 188- 200 ,(2010) , 10.1007/978-3-642-14165-2_17

10.

Tin Kam Ho, Random decision forests international conference on document analysis and recognition. ,vol. 1, pp. 278- 282 ,(1995) , 10.1109/ICDAR.1995.598994

On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications.

来源期刊

我的账户

On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications.

来源期刊

相似文章 1

Self-tuning Filers — Overload Prediction and Preventive Tuning Using Pruned Random Forest

我的账户