Parameter determination and feature selection for C4.5 algorithm using scatter search approach

作者: Shih-Wei Lin , Shih-Chieh Chen

DOI: 10.1007/S00500-011-0734-Z

关键词:

摘要: The C4.5 decision tree (DT) can be applied in various fields and discovers knowledge for human understanding. However, different problems typically require parameter settings. Rule of thumb or trial-and-error methods are generally utilized to determine these may result poor settings unsatisfactory results. On the other hand, although a dataset contain numerous features, not all features beneficial classification algorithm. Therefore, novel scatter search-based approach (SS + DT) is proposed acquire optimal select subset that better To evaluate efficiency SS + DT approach, datasets UCI (University California, Irvine) Machine Learning Repository assess performance approach. Experimental results demonstrate algorithm obtained by than those approaches. When feature selection considered, accuracy rates on most increased. identify effectively best useful features.

参考文章(30)
Rasha S Abdule-Wahab, Nicolas Monmarché, Mohamed Slimane, Moaid A Fahdil, Hilal H Saleh, None, A Scatter Search Algorithm for the Automatic Clustering Problem Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. pp. 350- 364 ,(2006) , 10.1007/11790853_28
Ron Kohavi, George H. John, Automatic Parameter Selection by Minimizing Estimated Error Machine Learning Proceedings 1995. pp. 304- 312 ,(1995) , 10.1016/B978-1-55860-377-6.50045-1
Rafael Marti, Manuel Laguna, Scatter Search: Methodology and Implementations in C ,(2011)
Deborah R Carvalho, Alex A Freitas, None, A genetic-algorithm for discovering small-disjunct rules in data mining Applied Soft Computing. ,vol. 2, pp. 75- 88 ,(2002) , 10.1016/S1568-4946(02)00031-5
M.J. Aitkenhead, A co-evolving decision tree classification method Expert Systems With Applications. ,vol. 34, pp. 18- 25 ,(2008) , 10.1016/J.ESWA.2006.08.008