Characterization and identification of long non-coding RNAs based on feature relationship

作者: Guangyu Wang , Hongyan Yin , Boyang Li , Chunlei Yu , Fan Wang

DOI: 10.1101/327882

关键词:

摘要: The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification lncRNAs a wide range species remains challenging; it requires prior knowledge well-established sequences annotations or species-specific training data, but reality is that only limited number have high-quality annotations. Here we first characterize by contrast to protein-coding based on feature relationship find between ORF (open reading frame) length GC content presents universally substantial divergence RNAs, as observed broad variety species. Based relationship, accordingly, further present LGC, novel algorithm for identifying able accurately distinguish from cross-species manner without any knowledge. As validated large-scale empirical datasets, comparative results show LGC outperforms existing algorithms achieving higher accuracy, well-balanced sensitivity specificity, robustly effective (>90% accuracy) discriminating across diverse plants mammals. To our knowledge, this study, time, differentially characterizes which applied lncRNAs. Taken together, study represents significant advance characterization thus bears potential utility analysis

参考文章(41)
Jonathan M. Mudge, Jennifer Harrow, Creating reference gene annotation for the mouse C57BL6/J genome assembly Mammalian Genome. ,vol. 26, pp. 366- 378 ,(2015) , 10.1007/S00335-015-9583-X
P. Senapathy, Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 83, pp. 2133- 2137 ,(1986) , 10.1073/PNAS.83.7.2133
J. E. Wilusz, H. Sunwoo, D. L. Spector, Long noncoding RNAs: functional surprises from the RNA world Genes & Development. ,vol. 23, pp. 1494- 1504 ,(2009) , 10.1101/GAD.1800909
Arjun Raj, Xiuli An, Narla Mohandas, David M. Bodine, Ross C. Hardison, Mitchell J. Weiss, Vikram R. Paralkar, Tejaswini Mishra, Jing Luan, Yu Yao, Andrew V. Kossenkov, Stacie M. Anderson, Margaret Dunagin, Maxim Pimkin, Meghneel Gore, Diana Sun, Neeraja Konuthula, Lineage and species-specific long noncoding RNAs during erythro-megakaryocytic development. Blood. ,vol. 123, pp. 1927- 1937 ,(2014) , 10.1182/BLOOD-2013-12-544494
Camilo Mora, Derek P. Tittensor, Sina Adl, Alastair G. B. Simpson, Boris Worm, How Many Species Are There on Earth and in the Ocean PLOS Biology. ,vol. 9, ,(2011) , 10.1371/JOURNAL.PBIO.1001127
Aimin Li, Junying Zhang, Zhongyin Zhou, PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme BMC Bioinformatics. ,vol. 15, pp. 311- 311 ,(2014) , 10.1186/1471-2105-15-311
José L. Oliver, Antonio Marín, A relationship between GC content and coding-sequence length Journal of Molecular Evolution. ,vol. 43, pp. 216- 223 ,(1996) , 10.1007/BF02338829
Tanvir Alam, Yulia A. Medvedeva, Hui Jia, James B. Brown, Leonard Lipovich, Vladimir B. Bajic, Promoter Analysis Reveals Globally Differential Regulation of Human Long Non-Coding RNA and Protein-Coding Genes PLoS ONE. ,vol. 9, pp. e109443- ,(2014) , 10.1371/JOURNAL.PONE.0109443
Jinfeng Liu, Julian Gough, Burkhard Rost, PLoS Genetics EIC Wayne Frankel, Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines PLOS Genetics. ,vol. 2, ,(2006) , 10.1371/JOURNAL.PGEN.0020029
Matthew K Iyer, Yashar S Niknafs, Rohit Malik, Udit Singhal, Anirban Sahu, Yasuyuki Hosono, Terrence R Barrette, John R Prensner, Joseph R Evans, Shuang Zhao, Anton Poliakov, Xuhong Cao, Saravana M Dhanasekaran, Yi-Mi Wu, Dan R Robinson, David G Beer, Felix Y Feng, Hariharan K Iyer, Arul M Chinnaiyan, The landscape of long noncoding RNAs in the human transcriptome Nature Genetics. ,vol. 47, pp. 199- 208 ,(2015) , 10.1038/NG.3192