A comparison of different off-centered entropies to deal with class imbalance for decision trees

作者: Philippe Lenca , Stéphane Lallich , Thanh-Nghi Do , Nguyen-Khang Pham

DOI: 10.1007/978-3-540-68125-0_59

关键词:

摘要: In data mining, large differences in prior class probabilities known as the imbalance problem have been reported to hinder performance of classifiers such decision trees. Dealing with imbalanced and cost-sensitive has recognized one 10 most challenging problems mining research. trees learning, many measures are based on concept Shannon's entropy. A major characteristic entropies is that they take their maximal value when distribution modalities variable uniform. To deal problem, we proposed an off-centered entropy which takes its maximum for a fixed by user. This can be priori or taking into account costs misclassification. Others authors asymmetric this paper present concepts three compare effectiveness 20 sets. All our experiments founded C4.5 algorithm, only function modified. The results promising show interest imbalance.

参考文章(29)
Xu-Ying Liu, Zhi-Hua Zhou, On multi-class cost-sensitive learning national conference on artificial intelligence. pp. 567- 572 ,(2006)
Stan Matwin, Miroslav Kubat, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. international conference on machine learning. pp. 179- 186 ,(1997)
Alexander Liu, Joydeep Ghosh, Cheryl E. Martin, Generative Oversampling for Mining Imbalanced Datasets. DMIN. pp. 66- 72 ,(2007)
Wei-Yin Loh, Yu-Shan Shih, SPLIT SELECTION METHODS FOR CLASSIFICATION TREES ,(1997)
Gilbert Ritschard, Simon Marcellin, Djamel A. Zighed, Mesure d'entropie asymétrique et consistante. EGC. pp. 81- 86 ,(2007)
Gilbert Ritschard, Simon Marcellin, Djamel A. Zighed, An asymmetric entropy measure for decision trees pp. 1292- 1299 ,(2006)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Nathalie Japkowicz, Shaju Stephen, The class imbalance problem: A systematic study intelligent data analysis. ,vol. 6, pp. 429- 449 ,(2002) , 10.3233/IDA-2002-6504