Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine.

作者: Jun Meng , Dong Liu , Chao Sun , Yushi Luan

DOI: 10.1186/S12859-014-0423-X

关键词:

摘要: MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown miRNAs involved biological responses to variety biotic abiotic stresses. Identification these targets can aid understanding regulatory processes. Recently, prediction methods based on machine learning been widely used for miRNA prediction. However, most were designed mammalian prediction, few available predicting pre-miRNAs specific plant species. Although complete Solanum lycopersicum genome has published, only 77 identified, far less than estimated number. Therefore, it is essential develop method identify new miRNAs. A novel classification model support vector (SVM) was trained real pseudo together with An initial set 152 features related sequential structures train model. By applying feature selection, we obtained best subset 47 use Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) pre-miRNAs. Using this method, 63 classification. We then developed an integrated model, miPlantPreMat, which comprises MiPlantPre MiPlantMat, This achieved 90% accuracy using datasets from nine species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, lyrata, Zea mays lycopersicum. 522 identified sequence. structure-sequence SVM. MiPlantPreMat both corresponding mature improved selection proposed, resulting high accuracy, sensitivity specificity.

参考文章(38)
Mireille Regnier, Knuth-Morris-Pratt Algorithm: An Analysis mathematical foundations of computer science. pp. 431- 444 ,(1989) , 10.1007/3-540-51486-4_90
M. Hall, Correlation-based Feature Selection for Machine Learning PhD Thesis, Waikato Univer-sity. ,(1998)
George H. John, Pat Langley, Estimating continuous distributions in Bayesian classifiers uncertainty in artificial intelligence. pp. 338- 345 ,(1995)
Chenghai Xue, Fei Li, Tao He, Guo-Ping Liu, Yanda Li, Xuegong Zhang, Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine BMC Bioinformatics. ,vol. 6, pp. 310- 310 ,(2005) , 10.1186/1471-2105-6-310
Saibal Chatterjee, Helge Großhans, Active turnover modulates mature microRNA activity in Caenorhabditis elegans Nature. ,vol. 461, pp. 546- 549 ,(2009) , 10.1038/NATURE08349
Daehyun Baek, Judit Villén, Chanseok Shin, Fernando D. Camargo, Steven P. Gygi, David P. Bartel, The impact of microRNAs on protein output Nature. ,vol. 455, pp. 64- 71 ,(2008) , 10.1038/NATURE07242
UWE Ohler, Soraya Yekta, Lee P Lim, David P Bartel, Christopher B Burge, Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. RNA. ,vol. 10, pp. 1309- 1322 ,(2004) , 10.1261/RNA.5206304
Peizhang Xu, Stephanie Y. Vernooy, Ming Guo, Bruce A. Hay, The Drosophila MicroRNA Mir-14 Suppresses Cell Death and Is Required for Normal Fat Metabolism Current Biology. ,vol. 13, pp. 790- 795 ,(2003) , 10.1016/S0960-9822(03)00250-1