A Survey of Predictive Modelling under Imbalanced Distributions

作者: Luis Torgo , Paula Branco , Rita P. Ribeiro

DOI:

关键词:

摘要: Many real world data mining applications involve obtaining predictive models using sets with strongly imbalanced distributions of the target variable. Frequently, least common values this variable are associated events that highly relevant for end users (e.g. fraud detection, unusual returns on stock markets, anticipation catastrophes, etc.). Moreover, may have dierent costs and benets, which when rarity some them available training creates serious problems to modelling techniques. This paper presents a survey existing techniques handling these important analytics. Although most work addresses classication tasks (nominal variables), we also describe methods designed handle similar within regression (numeric variables). In discuss main challenges raised by distributions, approaches problems, propose taxonomy refer related modelling.

参考文章(162)
S. B. Kotsiantis, P. E. Pintelas, Mixture of Expert Agents for Handling Imbalanced Data Sets ,(2003)
R. Alejo, V. García, J. M. Sotoca, R. A. Mollineda, J. S. Sánchez, Improving the performance of the RBF neural networks trained with imbalanced samples international work-conference on artificial and natural neural networks. pp. 162- 169 ,(2007) , 10.1007/978-3-540-73007-1_20
Luís Torgo, Rita P. Ribeiro, Bernhard Pfahringer, Paula Branco, SMOTE for Regression portuguese conference on artificial intelligence. pp. 378- 389 ,(2013) , 10.1007/978-3-642-40669-0_33
Stan Matwin, Miroslav Kubat, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. international conference on machine learning. pp. 179- 186 ,(1997)
Alexander Liu, Joydeep Ghosh, Cheryl E. Martin, Generative Oversampling for Mining Imbalanced Datasets. DMIN. pp. 66- 72 ,(2007)
Ronaldo C. Prati, Gustavo E. A. P. A. Batista, Maria Carolina Monard, Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior mexican international conference on artificial intelligence. pp. 312- 321 ,(2004) , 10.1007/978-3-540-24694-7_32
R. Alejo, J. M. Sotoca, V. García, R. M. Valdovinos, Back Propagation with Balanced MSE Cost Function and Nearest Neighbor Editing for Handling Class Overlap and Class Imbalance Advances in Computational Intelligence. pp. 199- 206 ,(2011) , 10.1007/978-3-642-21501-8_25
Nathalie Japkowicz, Learning from Imbalanced Data Sets: A Comparison of Various Strategies * International Workshop on Learning from Imbalanced Data Sets. ,(2000)
Gary M. Weiss, The Impact of Small Disjuncts on Classifier Learning Annals of Information Systems. pp. 193- 226 ,(2010) , 10.1007/978-1-4419-1280-0_9