Class imbalance methods for translation initiation site recognition

作者: Nicolás García-Pedrajas , Colin Fyfe , Domingo Ortiz-Boyer , María D. García-Pedrajas

DOI: 10.5555/1945758.1945797

关键词: Speech recognitionTranslation initiation sitesArtificial intelligenceMinority classGene recognitionTask (project management)Machine learningClass imbalanceStop codonComputer science

摘要: Translation initiation sites (TIS) recognition is one of the first steps in gene structure prediction, and common components any system. Many methods have been described literature to identify TIS transcripts such as mRNA, EST cDNA sequences. However, DNA sequences a far more challenging task, so for achieve poor results Most approach this problem taking into account its biological features. In work we try different view, considering classification from purely machine learning perspective.From point view learning, class imbalance problem. Thus, paper angle, apply that developed deal with datasets.Results show an advantage respect same applied without nature The are also able improve obtained best method literature, which based on looking next in-frame stop codon putative must be predicted.

参考文章(15)
Charles X. Ling, Chenghui Li, Data mining for direct marketing: problems and solutions knowledge discovery and data mining. pp. 73- 79 ,(1998)
Huiqing Liu, Jinyan Li, Limsoon Wong, Hao Han, Using amino acid patterns to accurately predict translation initiation sites. in Silico Biology. ,vol. 4, pp. 255- 269 ,(2004)
Miroslav Kubat, Robert C. Holte, Stan Matwin, Machine Learning for the Detection of Oil Spills in Satellite Radar Images Machine Learning. ,vol. 30, pp. 195- 215 ,(1998) , 10.1023/A:1007452223027
Salvador García, Francisco Herrera, Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy Evolutionary Computation. ,vol. 17, pp. 275- 306 ,(2009) , 10.1162/EVCO.2009.17.3.275
Ricardo Barandela, José Salvador Sánchez, Vicente García, Edgar Rangel, STRATEGIES FOR LEARNING IN CLASS IMBALANCE PROBLEMS Pattern Recognition. ,vol. 36, pp. 849- 851 ,(2003) , 10.1016/S0031-3203(02)00257-1
Scott Cost, Steven Salzberg, A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features Machine Learning. ,vol. 10, pp. 57- 78 ,(1993) , 10.1023/A:1022664626993
Yvan Saeys, Thomas Abeel, Sven Degroeve, Yves Van de Peer, Translation initiation site prediction on a genomic scale intelligent systems in molecular biology. ,vol. 23, pp. 418- 423 ,(2007) , 10.1093/BIOINFORMATICS/BTM177
Yanmin Sun, Mohamed S. Kamel, Andrew K.C. Wong, Yang Wang, Cost-sensitive boosting for classification of imbalanced data Pattern Recognition. ,vol. 40, pp. 3358- 3378 ,(2007) , 10.1016/J.PATCOG.2007.04.009
Jianxin Wu, Zhi-Hua Zhou, Xu-Ying Liu, Exploratory Undersampling for Class-Imbalance Learning systems man and cybernetics. ,vol. 39, pp. 539- 550 ,(2009) , 10.1109/TSMCB.2008.2007853
Robert E. Schapire, Yoav Freund, Experiments with a new boosting algorithm international conference on machine learning. pp. 148- 156 ,(1996)