作者: Eyad Elyan , Khaled Fawagreh , Mohamed Medhat Gaber
DOI:
关键词:
摘要: Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other techniques, it has proved its accuracy and superiority. Many researchers, however, believe there still room for enhancing improving performance accuracy. This explains why, the past decade, have been many extensions of RF where each extension employed variety techniques strategies to improve certain aspect(s) RF. Since proven empiricallthat ensembles tend yield better results when significant diversity among constituent models, objective this paper twofold. First, investigates how data clustering (a well known technique) can be applied identify groups similar decision trees in order eliminate redundant selecting representative from group (cluster). Second, these likely diverse representatives are then used produce termed CLUB-DRF much smaller size than RF, yet performs at least as good mostly exhibits higher terms The latter refers called pruning. Experimental on 15 real datasets UCI repository prove superiority our proposed traditional Most experiments achieved 95% or above pruning level while retaining outperforming