Imbalanced class learning in epigenetics.

作者: M. Muksitul Haque , Michael K. Skinner , Lawrence B. Holder

DOI: 10.1089/CMB.2014.0008

关键词:

摘要: In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with large ratio between minority and majority classes face hindrance in learning using any classifier. having magnitude difference number instances target concept result an imbalanced class distribution. Such datasets can range from biological data, sensor medical diagnostics, or other domain where labeling be time-consuming costly data may not easily available. The current study investigates algorithms solving distribution present epigenetic datasets. Epigenetic (DNA methylation) inherently come few differentially DNA methylated regions (DMR) non-DMR sites. For this imbalance problem, are compared, including TAN+AdaBoost algorithm. Experiments performed on four several known show that dataset have similar as regular learner

参考文章(50)
Salvatore J. Stolfo, Philip K. Chan, Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection knowledge discovery and data mining. pp. 164- 168 ,(1998)
Robert C. Holte, Bruce W. Porter, Liane E. Acker, Concept learning and the problem of small disjuncts international joint conference on artificial intelligence. pp. 813- 818 ,(1989)
Pengyi Yang, Zili Zhang, Bing B. Zhou, Albert Y. Zomaya, Sample Subset Optimization for Classifying Imbalanced Biological Data Advances in Knowledge Discovery and Data Mining. pp. 333- 344 ,(2011) , 10.1007/978-3-642-20847-8_28
Miroslav Kubat, Robert C. Holte, Stan Matwin, Machine Learning for the Detection of Oil Spills in Satellite Radar Images Machine Learning. ,vol. 30, pp. 195- 215 ,(1998) , 10.1023/A:1007452223027
Peter A Jones, Daiya Takai, The CpG island searcher: a new WWW resource. in Silico Biology. ,vol. 3, pp. 235- 240 ,(2003)
Nir Friedman, Dan Geiger, Moises Goldszmidt, Bayesian Network Classifiers Machine Learning. ,vol. 29, pp. 131- 163 ,(1997) , 10.1023/A:1007465528199
Joseph B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem Proceedings of the American Mathematical Society. ,vol. 7, pp. 48- 50 ,(1956) , 10.1090/S0002-9939-1956-0078686-7
Gary M. Weiss, Mining with rarity ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 7- 19 ,(2004) , 10.1145/1007730.1007734
Shihui Yang, Timothy J Tschaplinski, Nancy L Engle, Sue L Carroll, Stanton L Martin, Brian H Davison, Anthony V Palumbo, Miguel Rodriguez, Steven D Brown, Transcriptomic and metabolomic profiling of Zymomonas mobilis during aerobic and anaerobic fermentations BMC Genomics. ,vol. 10, pp. 34- 34 ,(2009) , 10.1186/1471-2164-10-34
Ramji K. Bhandari, Md. M. Haque, Michael K. Skinner, Global Genome Analysis of the Downstream Binding Targets of Testis Determining Factor SRY and SOX9 PLoS ONE. ,vol. 7, pp. e43380- ,(2012) , 10.1371/JOURNAL.PONE.0043380