Machine-learning classifiers for imbalanced tornado data

作者: Theodore B. Trafalis , Indra Adrianto , Michael B. Richman , S. Lakshmivarahan

DOI: 10.1007/S10287-013-0174-6

关键词: Probabilistic logicRare eventsPattern recognitionFeature (machine learning)Support vector machineComputer scienceArtificial intelligenceFeature selectionTornadoSevere weatherMachine learningRandom forest

摘要: Learning from imbalanced data, where the number of observations in one class is significantly larger than ones other class, has gained considerable attention machine learning community. Assuming difficulty predicting each similar, most standard classifiers will tend to predict majority well. This study applies tornado data that are highly imbalanced, as they rare events. The severe weather used herein have thunderstorm circulations (mesocyclones) produce tornadoes approximately 6.7 % total observations. However, since high impact events, it important minority with accuracy. In this study, we apply support vector machines (SVMs) and logistic regression without a midpoint threshold adjustment on probabilistic outputs, random forest, rotation forest for prediction. Feature selection SVM-recursive feature elimination was also performed identify features or variables tornadoes. results showed SVMs provided better performance compared classifiers.

参考文章(27)
John S. Baras, Alvaro A. Cárdenas, B-ROC curves for the assessment of classifiers over imbalanced data sets national conference on artificial intelligence. pp. 1581- 1584 ,(2006)
Stan Matwin, Miroslav Kubat, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. international conference on machine learning. pp. 179- 186 ,(1997)
Foster Provost, R Fawcett, T, Kohavi, The Case against Accuracy Estimation for Comparing Induction Algorithms international conference on machine learning. pp. 445- 453 ,(1998)
Theodore B. Trafalis, Huseyin Ince, Michael B. Richman, Tornado detection with support vector machines international conference on computational science. pp. 289- 298 ,(2003) , 10.1007/3-540-44864-0_30
Robert Mcgill, John W. Tukey, Wayne A. Larsen, Variations of Box Plots The American Statistician. ,vol. 32, pp. 12- 16 ,(1978) , 10.1080/00031305.1978.10479236
Robert J Tibshirani, Bradley Efron, An introduction to the bootstrap ,(1993)
Michael B. Richman, Rotation of principal components International Journal of Climatology. ,vol. 6, pp. 293- 335 ,(1986) , 10.1002/JOC.3370060305