作者: Lívia Silva , Felipe de Souza Teixeira , José Ortega , Luis Zárate , Cristiane Nobre
DOI: 10.1186/1471-2164-12-S4-S9
关键词:
摘要: The accurate prediction of the initiation translation in sequences mRNA is an important activity for genome annotation. However, obtaining not always a simple task and can be modeled as problem classification between positive (protein codifiers) negative (non-codifiers). highly imbalanced because each molecule has unique site various others that are initiators. Therefore, this study focuses on from perspective balancing classes we present undersampling method, M-clus, which based clustering. method also adds features to improves performance classifier through inclusion knowledge obtained by model, called InAKnow. Through methodology, measures used (accuracy, sensitivity, specificity adjusted accuracy) greater than 93% Mus musculus Rattus norvegicus organisms, varied 72.97% 97.43% other organisms evaluated: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Nasonia vitripennis. precision increases significantly 39% 22.9% norvegicus, respectively, when model included. For 37.10% 59.49%. certain during training, example, presence ATG upstream region Translation Initiation Site, rate sensitivity approximately 7%. Using M-Clus generates significant increase 51.39% 91.55% (Mus musculus) 47.45% 88.09% (Rattus norvegicus). In order solve TIS prediction, results indicate methodology proposed work adequate, particularly using concept acquired increased accuracy all databases evaluated.