作者: Michiel J.L. de Hoon , Piero Carninci , Wyeth W. Wasserman , Yoshihide Hayashizaki , Harukazu Suzuki
DOI: 10.1101/634261
关键词: Biology 、 Microsatellite 、 Transcription (biology) 、 Cap analysis gene expression 、 Promoter 、 Computational biology 、 Enhancer 、 Gene 、 Transcriptional noise 、 Repertoire
摘要: Abstract Using the Cap Analysis of Gene Expression technology, FANTOM5 consortium provided one most comprehensive maps Transcription Start Sites (TSSs) in several species. Strikingly, ~ 72% them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. To determine whether these TSSs, sometimes referred as ‘transcriptional noise’ ‘junk’, are relevant nonetheless, we look for novel conserved regulatory motifs located their vicinity. We show that, all species studied, significant fraction CAGE peaks short tandem repeats (STRs) corresponding homopolymers thymidines. Biochemical genetic evidence further demonstrate that CAGEs correspond TSSs mostly sense intronic non-coding RNAs, whose transcription rate can predicted with 81% accuracy by sequence-based deep learning model. Excitingly, our model predicts variants linked human diseases affect this STR-associated transcription. Together, results extend repertoire provides valuable resource future studies complex traits.