An Empirical Study of Oversampling and Undersampling Methods for LCMine an Emerging Pattern Based Classifier

作者: Octavio Loyola-González , Milton García-Borroto , Miguel Angel Medina-Pérez , José Fco. Martínez-Trinidad , Jesús Ariel Carrasco-Ochoa

DOI: 10.1007/978-3-642-38989-4_27

关键词:

摘要: Classifiers based on emerging patterns are usually more understandable for humans than those complex mathematical models. However, most of the classifiers get low accuracy in problems with imbalanced databases. This problem has been tackled through oversampling or undersampling methods, nevertheless, to best our knowledge these methods have not tested patterns. Therefore, this paper, we present an empirical study about use and improve a classifier We apply popular over 30 databases from UCI Repository Machine Learning. Our experimental results show that using significantly improves minority class.

参考文章(19)
Philippe Lenca, Stéphane Lallich, Thanh-Nghi Do, Nguyen-Khang Pham, A comparison of different off-centered entropies to deal with class imbalance for decision trees knowledge discovery and data mining. pp. 634- 643 ,(2008) , 10.1007/978-3-540-68125-0_59
Ronaldo C Prati, Gustavo EAPA Batista, Maria Carolina Monard, None, A Study with Class Imbalance and Random Sampling for a Decision Tree Learning System international conference on artificial intelligence in theory and practice. pp. 131- 140 ,(2008) , 10.1007/978-0-387-09695-7_13
Janez Demšar, Statistical Comparisons of Classifiers over Multiple Data Sets Journal of Machine Learning Research. ,vol. 7, pp. 1- 30 ,(2006)
Nitesh V. Chawla, David A. Cieslak, Sanjay Chawla, Wei Liu, A Robust Decision Tree Algorithm for Imbalanced Data Sets. siam international conference on data mining. pp. 766- 777 ,(2010)
Nitesh V. Chawla, Data Mining for Imbalanced Datasets: An Overview The Data Mining and Knowledge Discovery Handbook. pp. 875- 886 ,(2005) , 10.1007/978-0-387-09823-4_45
Gustavo E. A. P. A. Batista, Ronaldo C. Prati, Maria Carolina Monard, A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 20- 29 ,(2004) , 10.1145/1007730.1007735
Milton García-Borroto, José Fco. Martínez-Trinidad, Jesús Ariel Carrasco-Ochoa, Miguel Angel Medina-Pérez, José Ruiz-Shulcloper, LCMine: An efficient algorithm for mining discriminative regularities and its application in supervised classification Pattern Recognition. ,vol. 43, pp. 3025- 3034 ,(2010) , 10.1016/J.PATCOG.2010.04.008
Oded Maimon, Lior Rokach, Data Mining and Knowledge Discovery Handbook ,(2005)
Andrew Estabrooks, Taeho Jo, Nathalie Japkowicz, A Multiple Resampling Method for Learning from Imbalanced Data Sets computational intelligence. ,vol. 20, pp. 18- 36 ,(2004) , 10.1111/J.0824-7935.2004.T01-1-00228.X
Milton García-Borroto, José Fco. Martínez-Trinidad, Jesús Ariel Carrasco-Ochoa, A survey of emerging patterns for supervised classification Artificial Intelligence Review. ,vol. 42, pp. 705- 721 ,(2014) , 10.1007/S10462-012-9355-X