A hybrid under-sampling approach for mining unbalanced datasets: applications to banking and insurance

作者: Madireddi Vasu , Vadlamani Ravi

DOI: 10.1504/IJDMMM.2011.038812

关键词:

摘要: In solving unbalanced classification problems, machine learning algorithms are overwhelmed by the majority class and consequently misclassify minority observations. Here, we propose a hybrid under-sampling approach to improve performance of classifiers. The proposed first employs k -reverse nearest neighbour (kRNN) method detect outliers from class. After removing outliers, using K-means clustering, K-clusters selected further reduce influence Then, employed support vector (SVM), logistic regression (LR), multi layer perceptron (MLP), radial basis function network (RBF), group data handling (GMDH), genetic programming (GP) decision tree (J48) for purpose. effectiveness was demonstrated on datasets taken insurance fraud detection credit card churn in banking domain. Ten-fold cross validation used study. It is observed that improved

参考文章(86)
Stan Matwin, Miroslav Kubat, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. international conference on machine learning. pp. 179- 186 ,(1997)
Marley M. B. R. Vellasco, Marco Aurélio Cavalcanti Pacheco, Carlos R. Hall Barbosa, Jorge Ferreira, Data Mining Techniques on the Evaluation of Wireless Churn the european symposium on artificial neural networks. pp. 483- 488 ,(2004)
Charles X. Ling, Chenghui Li, Data mining for direct marketing: problems and solutions knowledge discovery and data mining. pp. 73- 79 ,(1998)
Sotiris Kotsiantis, Dimitris Kanellopoulos, Panayiotis Pintelas, Handling imbalanced datasets: A review ,(2006)
Riccardo Poli, William B. Langdon, Nicholas F. McPhee, John R. Koza, A Field Guide to Genetic Programming ,(2008)
Gustavo EAPA Batista, Maria C Monard, Ana LC Bazzan, None, Improving Rule Induction Precision for Automated Annotation by Balancing Skewed Data Sets International Symposium on Knowledge Exploration in Life Science Informatics. pp. 20- 32 ,(2004) , 10.1007/978-3-540-30478-4_3
Tom Fawcett, Foster Provost, Adaptive Fraud Detection Data Mining and Knowledge Discovery. ,vol. 1, pp. 291- 316 ,(1997) , 10.1023/A:1009700419189
Gary M. Weiss, Learning with Rare Cases and Small Disjuncts Machine Learning Proceedings 1995. pp. 558- 565 ,(1995) , 10.1016/B978-1-55860-377-6.50075-X
MAH Farquad, Vadlamani Ravi, S Bapi Raju, None, Data Mining Using Rules Extracted from SVM: An Application to Churn Prediction in Bank Credit Cards granular computing. pp. 390- 397 ,(2009) , 10.1007/978-3-642-10646-0_47
Miroslav Kubat, Robert C. Holte, Stan Matwin, Machine Learning for the Detection of Oil Spills in Satellite Radar Images Machine Learning. ,vol. 30, pp. 195- 215 ,(1998) , 10.1023/A:1007452223027