Imbalanced Learning Based on Data-Partition and SMOTE

作者: Huaping Guo , Jun Zhou , Chang-An Wu

DOI: 10.3390/INFO9090238

关键词:

摘要: Classification of data with imbalanced class distribution has encountered a significant drawback by most conventional classification learning methods which assume relatively balanced distribution. This paper proposes novel method based on data-partition and SMOTE for learning. The proposed differs from ones in both the prediction stages. For stage, uses following three steps to learn class-imbalance oriented model: (1) partitioning majority into several clusters using partition such as K-Means, (2) constructing training set each obtained merging cluster minority class, (3) model convention including decision tree, SVM neural network. Therefore, classifier repository consisting models is constructed. With respect given example be classified, constructed stage select predict example. Comprehensive experiments KEEL sets show that outperforms some other existing evaluation measures recall, g-mean, f-measure AUC.

参考文章(55)
Parinaz Sobhani, Herna Viktor, Stan Matwin, Learning from imbalanced data using ensemble methods and cluster-based undersampling NFMCP'14 Proceedings of the 3rd International Conference on New Frontiers in Mining Complex Patterns. pp. 69- 83 ,(2014) , 10.1007/978-3-319-17876-9_5
Javier Muguerza, Jesús M. Pérez, Iñaki Albisua, Olatz Arbelaitz, Ibai Gurrutxaga, C4.5 consolidation process: an alternative to intelligent oversampling methods in class imbalance problems CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence. pp. 74- 83 ,(2011) , 10.1007/978-3-642-25274-7_8
Robin M. E. Swezey, Shun Shiramatsu, Tadachika Ozono, Toramatsu Shintani, An Improvement for Naive Bayes Text Classification Applied to Online Imbalanced Crowdsourced Corpuses Modern Advances in Intelligent Systems and Tools. pp. 147- 152 ,(2012) , 10.1007/978-3-642-30732-4_19
Chumphol Bunkhumpornpat, Krung Sinapiromsaran, Chidchanok Lursinsap, Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem Advances in Knowledge Discovery and Data Mining. pp. 475- 482 ,(2009) , 10.1007/978-3-642-01307-2_43
Junhai Zhai, Mengyao Zhai, Xiaomeng Kang, Condensed fuzzy nearest neighbor methods based on fuzzy rough set technique intelligent data analysis. ,vol. 18, pp. 429- 447 ,(2014) , 10.3233/IDA-140649
Janez Demšar, Statistical Comparisons of Classifiers over Multiple Data Sets Journal of Machine Learning Research. ,vol. 7, pp. 1- 30 ,(2006)
Boštjan Brumen, Ivan Rozman, Marjan Heričko, Aleš Černezel, Marko Hölbl, Best-Fit Learning Curve Model for the C4.5 Algorithm Informatica (lithuanian Academy of Sciences). ,vol. 25, pp. 385- 399 ,(2014) , 10.15388/INFORMATICA.2014.19
Di Wu, Xiao Chen, Chao Chen, Jun Zhang, Yang Xiang, Wanlei Zhou, On Addressing the Imbalance Problem: A Correlated KNN Approach for Network Traffic Classification network and system security. pp. 138- 151 ,(2014) , 10.1007/978-3-319-11698-3_11
Nathalie Japkowicz, Shaju Stephen, The class imbalance problem: A systematic study intelligent data analysis. ,vol. 6, pp. 429- 449 ,(2002) , 10.3233/IDA-2002-6504
Yong Zhang, Panpan Fu, Wenzhe Liu, Guolong Chen, Imbalanced data classification based on scaling kernel-based support vector machine Neural Computing and Applications. ,vol. 25, pp. 927- 935 ,(2014) , 10.1007/S00521-014-1584-2