A Comparison of Two Oversampling Techniques (SMOTE vs MTDF) for Handling Class Imbalance Problem: A Case Study of Customer Churn Prediction

作者: Adnan Amin , Faisal Rahim , Imtiaz Ali , Changez Khan , Sajid Anwar

DOI: 10.1007/978-3-319-16486-1_22

关键词:

摘要: Predicting the behavior of customer is at great importance for a project manager. Data driven industries such as telecommunication have advantage various data mining techniques to extract meaningful information regarding customer’s future behavior. However, prediction accuracy these significantly affected if real world highly imbalanced. In this study, we investigate and compare predictive performance two well-known oversampling Synthetic Minority Oversampling Technique (SMOT) Megatrend Diffusion Function (MTDF) four different rule generation algorithms (Exhaustive, Genetic, Covering, LEM2) based on rough set classification using publicly available sets. As useful feature extraction can play vital role not only in improving performance, but also reduce computational cost complexity by eliminating unnecessary features from dataset. Minimum Redundancy Maximum Relevance (mRMR) technique has been used proposed study which selects best subset reduces space. The results clearly demonstrate both rules that will help decision makers/researcher select ultimate one.

参考文章(28)
Jan G. Bazan, Hung Son Nguyen, Sinh Hoa Nguyen, Piotr Synak, Jakub Wróblewski, Rough set algorithms in classification problem Rough set methods and applications. pp. 49- 88 ,(2000) , 10.1007/978-3-7908-1840-6_3
Jakub Wróblewski, Genetic Algorithms in Decomposition and Classification Problems Physica, Heidelberg. pp. 471- 487 ,(1998) , 10.1007/978-3-7908-1883-3_24
Adnan Amin, Saeed Shehzad, Changez Khan, Imtiaz Ali, Sajid Anwar, None, Churn Prediction in Telecommunication Industry Using Rough Set Approach New Trends in Computational Collective Intelligence. pp. 83- 95 ,(2015) , 10.1007/978-3-319-10774-5_8
Jan G. Bazan, Marcin Szczuka, The Rough Set Exploration System Transactions on Rough Sets III. ,vol. 3400, pp. 37- 56 ,(2005) , 10.1007/11427834_2
G. Holmes, A. Donkin, I.H. Witten, WEKA: a machine learning workbench intelligent information systems. pp. 357- 361 ,(1994) , 10.1109/ANZIIS.1994.396988
Jerzy W. Grzymala-Busse, A New Version of the Rule Induction System LERS Fundamenta Informaticae. ,vol. 31, pp. 27- 39 ,(1997) , 10.3233/FI-1997-3113
Gary M. Weiss, Mining with rarity ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 7- 19 ,(2004) , 10.1145/1007730.1007734
Zdzislaw Pawlak, Rough sets, rough relations and rough functions Fundamenta Informaticae. ,vol. 27, pp. 103- 108 ,(1996) , 10.3233/FI-1996-272301
Adnan Amin, Changez Khan, Imtiaz Ali, Sajid Anwar, None, Customer Churn Prediction in Telecommunication Industry: With and without Counter-Example mexican international conference on artificial intelligence. pp. 206- 218 ,(2014) , 10.1007/978-3-319-13650-9_19
Gustavo E. A. P. A. Batista, Ronaldo C. Prati, Maria Carolina Monard, A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 20- 29 ,(2004) , 10.1145/1007730.1007735