Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA.

作者: Lívia Silva , Felipe de Souza Teixeira , José Ortega , Luis Zárate , Cristiane Nobre

DOI: 10.1186/1471-2164-12-S4-S9

关键词:

摘要: The accurate prediction of the initiation translation in sequences mRNA is an important activity for genome annotation. However, obtaining not always a simple task and can be modeled as problem classification between positive (protein codifiers) negative (non-codifiers). highly imbalanced because each molecule has unique site various others that are initiators. Therefore, this study focuses on from perspective balancing classes we present undersampling method, M-clus, which based clustering. method also adds features to improves performance classifier through inclusion knowledge obtained by model, called InAKnow. Through methodology, measures used (accuracy, sensitivity, specificity adjusted accuracy) greater than 93% Mus musculus Rattus norvegicus organisms, varied 72.97% 97.43% other organisms evaluated: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Nasonia vitripennis. precision increases significantly 39% 22.9% norvegicus, respectively, when model included. For 37.10% 59.49%. certain during training, example, presence ATG upstream region Translation Initiation Site, rate sensitivity approximately 7%. Using M-Clus generates significant increase 51.39% 91.55% (Mus musculus) 47.45% 88.09% (Rattus norvegicus). In order solve TIS prediction, results indicate methodology proposed work adequate, particularly using concept acquired increased accuracy all databases evaluated.

参考文章(36)
Guo-Liang Li, Tze-Yun Leong, Feature selection for the prediction of translation initiation sites. Genomics, Proteomics & Bioinformatics. ,vol. 3, pp. 73- 83 ,(2005) , 10.1016/S1672-0229(05)03012-3
P. Berkhin, A Survey of Clustering Data Mining Techniques Grouping Multidimensional Data. pp. 25- 71 ,(2006) , 10.1007/3-540-28349-8_2
George Tzanis, Christos Berberidis, Ioannis Vlahavas, A novel data mining approach for the accurate prediction of translation initiation sites international conference on biological and medical data analysis. pp. 92- 103 ,(2006) , 10.1007/11946465_9
Huiqing Liu, Jinyan Li, Limsoon Wong, Hao Han, Using amino acid patterns to accurately predict translation initiation sites. in Silico Biology. ,vol. 4, pp. 255- 269 ,(2004)
Thorsten Joachims, Making large-scale support vector machine learning practical Advances in kernel methods. pp. 169- 184 ,(1999)
Thorsten Joachims, Making large scale SVM learning practical Technical reports. ,(1999) , 10.17877/DE290R-14262
Cristiane N. Nobre, J. Miguel Ortega, Antônio de Pádua Braga, High efficiency on prediction of translation initiation site (TIS) of RefSeq sequences brazilian symposium on bioinformatics. pp. 138- 148 ,(2007) , 10.1007/978-3-540-73731-5_13
Ron Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection international joint conference on artificial intelligence. ,vol. 2, pp. 1137- 1143 ,(1995)
Anders Gorm Pedersen, Henrik Nielsen, Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome Analysis intelligent systems in molecular biology. ,vol. 5, pp. 226- 233 ,(1997)