作者: Jun Meng , Dong Liu , Chao Sun , Yushi Luan
DOI: 10.1186/S12859-014-0423-X
关键词:
摘要: MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown miRNAs involved biological responses to variety biotic abiotic stresses. Identification these targets can aid understanding regulatory processes. Recently, prediction methods based on machine learning been widely used for miRNA prediction. However, most were designed mammalian prediction, few available predicting pre-miRNAs specific plant species. Although complete Solanum lycopersicum genome has published, only 77 identified, far less than estimated number. Therefore, it is essential develop method identify new miRNAs. A novel classification model support vector (SVM) was trained real pseudo together with An initial set 152 features related sequential structures train model. By applying feature selection, we obtained best subset 47 use Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) pre-miRNAs. Using this method, 63 classification. We then developed an integrated model, miPlantPreMat, which comprises MiPlantPre MiPlantMat, This achieved 90% accuracy using datasets from nine species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, lyrata, Zea mays lycopersicum. 522 identified sequence. structure-sequence SVM. MiPlantPreMat both corresponding mature improved selection proposed, resulting high accuracy, sensitivity specificity.