作者: Philippe Lenca , Stéphane Lallich , Thanh-Nghi Do , Nguyen-Khang Pham
DOI: 10.1007/978-3-540-68125-0_59
关键词:
摘要: In data mining, large differences in prior class probabilities known as the imbalance problem have been reported to hinder performance of classifiers such decision trees. Dealing with imbalanced and cost-sensitive has recognized one 10 most challenging problems mining research. trees learning, many measures are based on concept Shannon's entropy. A major characteristic entropies is that they take their maximal value when distribution modalities variable uniform. To deal problem, we proposed an off-centered entropy which takes its maximum for a fixed by user. This can be priori or taking into account costs misclassification. Others authors asymmetric this paper present concepts three compare effectiveness 20 sets. All our experiments founded C4.5 algorithm, only function modified. The results promising show interest imbalance.