Mixed-sampling approach to unbalanced data distributions: a case study involving Leukemia's document profiling

作者: Wu QingQiang , Liu Hua , Liu KunHong

DOI:

关键词:

摘要: Leukemia's types and their relationships to literatures are introduced, based on which data set about Leukemia for classification is constructed with original sources, such as Cancer Gene Census, PubMed gene2pubmed. The imbalanced the research object. Based introduction of current methods set, problems sampling in analyzed, mixed-sampling method proposed classify set. multi-class problem transferred a two-class problems. Area Under Receiver Operating Characteristic (ROC) Curve (AUC) used evaluate method. Then, experiments performed verify efficiency stability eight methods, results comparatively analyzed. It can be found that achieves best performance. At last, work this paper concluded look forward future work.

参考文章(51)
Stan Matwin, Miroslav Kubat, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. international conference on machine learning. pp. 179- 186 ,(1997)
Michael Lishner, Arie Lalkin, Ami Klein, Shay Yarkoni, Yosef Manor, Moshe Fejgin, Vallery Leytin, Mordchai Ravid, Aliza Amiel, The BCL-1, BCL-2, and BCL-3 oncogenes are involved in chronic lymphocytic leukemia: Detection by fluorescence in situ hybridization Cancer Genetics and Cytogenetics. ,vol. 85, pp. 118- 123 ,(1995) , 10.1016/0165-4608(95)00152-2
M. Szybka, J. Z. Błoński, Piotr Rieske, Ewa Golańska, Jacek Bartkowiak, T. Robak, Izabela Wójcik, Abnormalities of the P53, MDM2, BCL2 and BAX genes in acute leukemias. Neoplasma. ,vol. 52, pp. 318- 324 ,(2005)
César Ferri, José Hernández-Orallo, Miguel Angel Salido, Volume under the ROC surface for multi-class problems european conference on machine learning. pp. 108- 120 ,(2003) , 10.1007/978-3-540-39857-8_12
Chumphol Bunkhumpornpat, Krung Sinapiromsaran, Chidchanok Lursinsap, Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem Advances in Knowledge Discovery and Data Mining. pp. 475- 482 ,(2009) , 10.1007/978-3-642-01307-2_43
Ramón Díaz-Uriarte, Sara Alvarez de Andrés, Gene selection and classification of microarray data using random forest BMC Bioinformatics. ,vol. 7, pp. 3- 3 ,(2006) , 10.1186/1471-2105-7-3
Thomas G. Dietterich, Ensemble Methods in Machine Learning Multiple Classifier Systems. pp. 1- 15 ,(2000) , 10.1007/3-540-45014-9_1
Ioannis Kapouleas, Sholom M. Weiss, An empirical comparison of pattern recognition, neural nets, and machine learning classification methods international joint conference on artificial intelligence. pp. 781- 787 ,(1989)
Eric J. Topol, Shaoqi Rao, Qingpu Zhang, Xia Li, Tianwen Zhang, Zheng Guo, Kathy L. Moser, An ensemble method for gene discovery based on DNA microarray data. Science China-life Sciences. ,vol. 47, pp. 396- 405 ,(2004) , 10.1007/BF03187097
I Tomek, Two Modifications of CNN systems man and cybernetics. ,vol. 6, pp. 769- 772 ,(1976)