作者: J. S. Rozowsky , D. Newburger , F. Sayward , J. Wu , G. Jordan
DOI: 10.1101/GR.5696007
关键词:
摘要: For the ∼1% of human genome in ENCODE regions, only about half transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount “unannotated transcription.” We use a number disparate features classify 6988 novel TARs—array expression profiles across cell lines and conditions, sequence composition, phylogenetic (presence/absence syntenic conservation 17 species), locations relative genes. In classification, first filter out TARs unusual composition those likely resulting from cross-hybridization. then associate some remaining proximal exons having correlated profiles. Finally, cluster unclassified into putative loci, based on similar To encapsulate our construct Database Active Regions Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling comparing many sets their heterogeneous features, synchronizing builds, interfacing other resources. Overall, find that ∼14% can be associated known genes, while ∼21% clustered ∼200 loci. observe genes are enriched potential form structural RNAs TAR clusters nearby promoters. benchmark design set experiments testing connectivity TARs. 18 46 connections tested validate by RT-PCR four five sequenced PCR products confirm unambiguously.