作者: Guangyu Wang , Hongyan Yin , Boyang Li , Chunlei Yu , Fan Wang
DOI: 10.1101/327882
关键词:
摘要: The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification lncRNAs a wide range species remains challenging; it requires prior knowledge well-established sequences annotations or species-specific training data, but reality is that only limited number have high-quality annotations. Here we first characterize by contrast to protein-coding based on feature relationship find between ORF (open reading frame) length GC content presents universally substantial divergence RNAs, as observed broad variety species. Based relationship, accordingly, further present LGC, novel algorithm for identifying able accurately distinguish from cross-species manner without any knowledge. As validated large-scale empirical datasets, comparative results show LGC outperforms existing algorithms achieving higher accuracy, well-balanced sensitivity specificity, robustly effective (>90% accuracy) discriminating across diverse plants mammals. To our knowledge, this study, time, differentially characterizes which applied lncRNAs. Taken together, study represents significant advance characterization thus bears potential utility analysis