Deep Learning to Identify Transcription Start Sites from CAGE Data

作者: Haiyan Hu , Xiaoman Li , Hansi Zheng

DOI: 10.1109/BIBM49941.2020.9313267

关键词:

摘要: Gene transcription start site (TSS) identification is important to understanding transcriptional gene regulation. Cap Analysis Expression (CAGE) experiments have recently become common practice for direct measurement of TSSs. Currently, CAGE data available in public databases created unprecedented opportunities study initiation mechanisms under various cellular conditions. However, due potential noises inherent data, in-silico methods are required identify bonafide TSSs from further. Here we present a computational approach dlCAGE, an end-to-end deep neural network data. dlCAGE incorporate de-novo DNA regulatory motif features discovered by DeepBind model architecture, as well existing sequence and structural features. Testing results several cell lines comparison with current state-of-the-art approaches showed its superior performance promise TSS experiments.

参考文章(22)
Babak Alipanahi, Andrew Delong, Matthew T Weirauch, Brendan J Frey, Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning Nature Biotechnology. ,vol. 33, pp. 831- 838 ,(2015) , 10.1038/NBT.3300
Rimantas Kodzius, Miki Kojima, Hiromi Nishiyori, Mari Nakamura, Shiro Fukuda, Michihira Tagami, Daisuke Sasaki, Kengo Imamura, Chikatoshi Kai, Matthias Harbers, Yoshihide Hayashizaki, Piero Carninci, CAGE: cap analysis of gene expression Nature Methods. ,vol. 3, pp. 211- 222 ,(2006) , 10.1038/NMETH0306-211
Hiroko Ohmiya, Morana Vitezic, Martin C Frith, Masayoshi Itoh, Piero Carninci, Alistair RR Forrest, Yoshihide Hayashizaki, Timo Lassmann, FANTOM Consortium, None, RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE) BMC Genomics. ,vol. 15, pp. 269- 269 ,(2014) , 10.1186/1471-2164-15-269
Fantom Consortium, None, A promoter-level mammalian expression atlas Nature. ,vol. 507, pp. 462- 470 ,(2014) , 10.1038/NATURE13182
Vanja Haberle, Alistair R.R. Forrest, Yoshihide Hayashizaki, Piero Carninci, Boris Lenhard, CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses Nucleic Acids Research. ,vol. 43, pp. 1- 11 ,(2015) , 10.1093/NAR/GKV054
Jun Ding, Vikram Dhillon, Xiaoman Li, Haiyan Hu, Systematic discovery of cofactor motifs from ChIP-seq data by SIOMICS. Methods. ,vol. 79, pp. 47- 51 ,(2015) , 10.1016/J.YMETH.2014.08.006
Hazuki Takahashi, Timo Lassmann, Mitsuyoshi Murata, Piero Carninci, 5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing Nature Protocols. ,vol. 7, pp. 542- 561 ,(2012) , 10.1038/NPROT.2012.005
E. Valen, A. Krogh, Y. Hayashizaki, P. Carninci, A. Sandelin, M. C. Frith, A code for transcription initiation in mammalian genomes Genome Research. ,vol. 18, pp. 1- 12 ,(2007) , 10.1101/GR.6831208
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
Shannon M. Ruppert, Mounir Chehtane, Ge Zhang, Haiyan Hu, Xiaoman Li, Annette R. Khaled, JunD/AP-1-Mediated Gene Expression Promotes Lymphocyte Growth Dependent on Interleukin-7 Signal Transduction PLoS ONE. ,vol. 7, pp. e32262- ,(2012) , 10.1371/JOURNAL.PONE.0032262