A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification.

作者: Hong Cao , Vincent Y. F. Tan , John Z. F. Pang

DOI: 10.1109/TNNLS.2014.2308321

关键词:

摘要: We propose a novel framework of using parsimonious statistical model, known as mixture Gaussian trees, for modeling the possibly multimodal minority class to solve problem imbalanced time-series classification. By exploiting fact that close-by time points are highly correlated due smoothness time-series, our model significantly reduces number covariance parameters be estimated from O(d(2)) O(Ld), where L is components and d dimensionality. Thus, particularly effective high-dimensional with limited instances in positive class. In addition, computational complexity learning only order O(Ln+d(2)) n+ positively labeled samples. conduct extensive classification experiments based on several well-known data sets (both single- multimodal) by first randomly generating synthetic learned correct imbalance. then compare results state-of-the-art oversampling techniques demonstrate when proposed used oversampling, same support vector machines classifier achieves much better accuracy across range sets. fact, method best average performance 30 times out 36 according F-value metric. Our also competitive compared nonoversampling-based classifiers dealing

参考文章(32)
Hong Cao, Xiao-Li Li, Yew-Kwong Woon, See-Kiong Ng, None, SPO: Structure Preserving Oversampling for Imbalanced Time Series Classification international conference on data mining. pp. 1008- 1013 ,(2011) , 10.1109/ICDM.2011.137
Hong Cao, Minh Nhut Nguyen, Clifton Phua, Shonali Krishnaswamy, Xiao-Li Li, An integrated framework for human activity classification ubiquitous computing. pp. 331- 340 ,(2012) , 10.1145/2370216.2370268
Klaus-U. Höffgen, Learning and robust learning of product distributions conference on learning theory. pp. 77- 83 ,(1993) , 10.1145/168304.168314
Gustavo E. A. P. A. Batista, Ronaldo C. Prati, Maria Carolina Monard, A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 20- 29 ,(2004) , 10.1145/1007730.1007735
Angela Simpson, Vincent Y. F. Tan, John Winn, Markus Svensén, Christopher M. Bishop, David E. Heckerman, Iain Buchan, Adnan Custovic, Beyond Atopy American Journal of Respiratory and Critical Care Medicine. ,vol. 181, pp. 1200- 1206 ,(2010) , 10.1164/RCCM.200907-1101OC
Yuchun Tang, Yan-Qing Zhang, N.V. Chawla, S. Krasser, SVMs Modeling for Highly Imbalanced Classification systems man and cybernetics. ,vol. 39, pp. 281- 288 ,(2009) , 10.1109/TSMCB.2008.2002909
Taeho Jo, Nathalie Japkowicz, Class imbalances versus small disjuncts ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 40- 49 ,(2004) , 10.1145/1007730.1007737
Vincent Y. F. Tan, Animashree Anandkumar, Alan S. Willsky, Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures IEEE Transactions on Signal Processing. ,vol. 58, pp. 2701- 2714 ,(2010) , 10.1109/TSP.2010.2042478
Hong Cao, Xiao-Li Li, David Yew-Kwong Woon, See-Kiong Ng, Integrated Oversampling for Imbalanced Time Series Classification IEEE Transactions on Knowledge and Data Engineering. ,vol. 25, pp. 2809- 2822 ,(2013) , 10.1109/TKDE.2013.37
Andrew Estabrooks, Taeho Jo, Nathalie Japkowicz, A Multiple Resampling Method for Learning from Imbalanced Data Sets computational intelligence. ,vol. 20, pp. 18- 36 ,(2004) , 10.1111/J.0824-7935.2004.T01-1-00228.X