Evaluation Measures for Models Assessment over Imbalanced Data Sets

作者: Taklit Akrouf Alitouche , Hassiba Kheliouane Djemaa , Mohamed Bekkar

DOI:

关键词:

摘要: Imbalanced data learning is one of the challenging problems in mining; among this matter, founding right model assessment measures almost a primary research issue. Skewed class distribution causes misreading common evaluation as well it lead biased classification. This article presents set alternative for imbalanced assessment, using combined (G-means, likelihood ratios, Discriminant power, F-Measure Balanced Accuracy, Youden index, Matthews correlation coefficient), and graphical performance (ROC curve, Area Under Curve, Partial AUC, Weighted Cumulative Gains Curve lift chart, Lift AUL), that aim to provide more credible evaluation. We analyze applications these churn prediction models evaluation, known application Keywords: data, Model accuracy , G-means, F-Measure, coefficient, ROC, P-AUC,W-AUC, Lift, AUL

参考文章(60)
Sukarna Barua, Md. Monirul Islam, Xin Yao, Kazuyuki Murase, MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning IEEE Transactions on Knowledge and Data Engineering. ,vol. 26, pp. 405- 425 ,(2014) , 10.1109/TKDE.2012.232
Afina S. Glas, Jeroen G. Lijmer, Martin H. Prins, Gouke J. Bonsel, Patrick M.M. Bossuyt, The diagnostic odds ratio: a single indicator of test performance Journal of Clinical Epidemiology. ,vol. 56, pp. 1129- 1135 ,(2003) , 10.1016/S0895-4356(03)00177-X
Robin Engler, Antoine Guisan, Luca Rechsteiner, An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data Journal of Applied Ecology. ,vol. 41, pp. 263- 274 ,(2004) , 10.1111/J.0021-8901.2004.00881.X
Foster Provost, Tom Fawcett, Robust Classification for Imprecise Environments Machine Learning. ,vol. 42, pp. 203- 231 ,(2001) , 10.1023/A:1007601015854
Shohei Hido, Hisashi Kashima, Yutaka Takahashi, Roughly balanced bagging for imbalanced data Statistical Analysis and Data Mining. ,vol. 2, pp. 412- 426 ,(2009) , 10.1002/SAM.V2:5/6
A.A. Cardenas, J.S. Baras, K. Seamon, A framework for the evaluation of intrusion detection systems ieee symposium on security and privacy. pp. 63- 77 ,(2006) , 10.1109/SP.2006.2
Dell Zhang, Wee Sun Lee, Learning classifiers without negative examples: A reduction approach international conference on digital information management. pp. 638- 643 ,(2008) , 10.1109/ICDIM.2008.4746761
Seyda Ertekin, Jian Huang, Leon Bottou, Lee Giles, Learning on the border: active learning in imbalanced data classification conference on information and knowledge management. pp. 127- 136 ,(2007) , 10.1145/1321440.1321461
Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, Joachim M. Buhmann, The Balanced Accuracy and Its Posterior Distribution international conference on pattern recognition. pp. 3121- 3124 ,(2010) , 10.1109/ICPR.2010.764
V.L. Miguéis, Dirk Van den Poel, A.S. Camanho, João Falcão e Cunha, Modeling partial customer churn: On the value of first product-category purchase sequences Expert Systems With Applications. ,vol. 39, pp. 11250- 11256 ,(2012) , 10.1016/J.ESWA.2012.03.073