*-DCC: A platform to collect, annotate, and explore a large variety of sequencing experiments

作者: Matthias Hörtenhuber , Abdul K Mukarram , Marcus H Stoiber , James B Brown , Carsten O Daub

DOI: 10.1093/GIGASCIENCE/GIAA024

关键词:

摘要: Background Over the past few years the variety of experimental designs and protocols for sequencing experiments increased greatly. To ensure the wide usability of the produced data beyond an individual project, rich and systematic annotation of the underlying experiments is crucial. Findings We first developed an annotation structure that captures the overall experimental design as well as the relevant details of the steps from the biological sample to the library preparation, the sequencing procedure, and the sequencing and processed files. Through various design features, such as controlled vocabularies and different field requirements, we ensured a high annotation quality, comparability, and ease of annotation. The structure can be easily adapted to a large variety of species. We then implemented the annotation strategy in a user-hosted web platform with data import, query, and export functionality. Conclusions We present here an annotation structure and user-hosted platform for sequencing experiment data, suitable for lab-internal documentation, collaborations, and large-scale annotation efforts.

参考文章(14)
Ron Edgar, Michael Domrachev, Alex E Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Research. ,vol. 30, pp. 207- 210 ,(2002) , 10.1093/NAR/30.1.207
T. Barrett, K. Clark, R. Gevorgyan, V. Gorelenkov, E. Gribov, I. Karsch-Mizrachi, M. Kimelman, K. D. Pruitt, S. Resenchuk, T. Tatusova, E. Yaschenko, J. Ostell, BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata Nucleic Acids Research. ,vol. 40, pp. 57- 63 ,(2012) , 10.1093/NAR/GKR1163
Nicole L. Washington, E. O. Stinson, Marc D. Perry, Peter Ruzanov, Sergio Contrino, Richard Smith, Zheng Zha, Rachel Lyne, Adrian Carr, Paul Lloyd, Ellen Kephart, Sheldon J. McKay, Gos Micklem, Lincoln D. Stein, Suzanna E. Lewis, The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details Database. ,vol. 2011, ,(2011) , 10.1093/DATABASE/BAR023
Tatiparthi BK Reddy, Alex D Thomas, Dimitri Stamatis, Jon Bertsch, Michelle Isbandi, Jakob Jansson, Jyothi Mallajosyula, Ioanna Pagani, Elizabeth A Lobos, Nikos C Kyrpides, None, The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification Nucleic Acids Research. ,vol. 43, pp. 1099- 1106 ,(2015) , 10.1093/NAR/GKU950
Susan E Celniker, Laura AL Dillon, Mark B Gerstein, Kristin C Gunsalus, Steven Henikoff, Gary H Karpen, Manolis Kellis, Eric C Lai, Jason D Lieb, David M MacAlpine, Gos Micklem, Fabio Piano, Michael Snyder, Lincoln Stein, Kevin P White, Robert H Waterston, modENCODE Consortium, None, Unlocking the secrets of the genome Nature. ,vol. 459, pp. 927- 930 ,(2009) , 10.1038/459927A
Y. Kodama, M. Shumway, R. Leinonen, , The sequence read archive: explosive growth of sequencing data Nucleic Acids Research. ,vol. 40, pp. 54- 56 ,(2012) , 10.1093/NAR/GKR854
Haihan Tan, Daria Onichtchouk, Cecilia Winata, None, DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish Zebrafish. ,vol. 13, pp. 54- 60 ,(2016) , 10.1089/ZEB.2015.1179
Cricket A. Sloan, Esther T. Chan, Jean M. Davidson, Venkat S. Malladi, J. Seth Strattan, Benjamin C. Hitz, Idan Gabdank, Aditi K. Narayanan, Marcus Ho, Brian T. Lee, Laurence D. Rowe, Timothy R. Dreszer, Greg Roe, Nikhil R. Podduturi, Forrest Tanaka, Eurie L. Hong, J. Michael Cherry, ENCODE data at the ENCODE portal Nucleic Acids Research. ,vol. 44, pp. 726- 732 ,(2016) , 10.1093/NAR/GKV1160
EA Feingold, PJ Good, MS Guyer, S Kamholz, L Liefer, K Wetterstrand, FS Collins, TR Gingeras, D Kampa, EA Sekinger, J Cheng, H Hirsch, S Ghosh, Z Zhu, S Patel, A Piccolboni, A Yang, H Tammana, S Bekiranov, P Kapranov, R Harrison, G Church, K Struhl, B Ren, TH Kim, LO Barrera, C Qu, S Van Calcar, R Luna, CK Glass, MG Rosenfeld, R Guigo, SE Antonarakis, E Birney, M Brent, L Pachter, A Reymond, ET Dermitzakis, C Dewey, D Keefe, F Denoeud, J Lagarde, J Ashurst, T Hubbard, JJ Wesselink, R Castelo, E Eyras, RM Myers, A Sidow, S Batzoglou, ND Trinklein, SJ Hartman, SF Aldred, E Anton, DI Schroeder, SS Marticke, L Nguyen, J Schmutz, J Grimwood, M Dickson, GM Cooper, EA Stone, G Asimenos, M Brudno, A Dutta, N Karnani, CM Taylor, HK Kim, G Robins, G Stamatoyannopoulos, JA Stamatoyannopoulos, M Dorschner, P Sabo, M Hawrytycz, R Humbert, J Wallace, M Yu, PA Navas, M McArthur, WS Noble, I Dunham, CM Koch, RM Andrews, GK Clelland, S Wilcox, JC Fowler, KD James, P Groth, OM Dovey, PD Ellis, VL Wraight, AJ Mungall, P Dhami, H Fiegler, CF Langford, NP Carter, D Vetrie, M Snyder, G Euskirchen, AE Urban, U Nagalakshmi, J Rinn, G Popescu, P Bertone, S Hartman, J Rozowsky, O Emanuelsson, T Royce, S Chung, M Gerstein, Z Lian, J Lian, Y Nakayama, S Weissman, V Stoic, W Tongprasit, H Sethi, S Jones, M Marra, H Shin, J Schein, M Clamp, K Lindblad-Toh, J Chang, DB Jaffe, ES Kamal, ES Lander, TS Mikkelsen, J Vinson, MC Zody, PJ de Jong, K Osoegawa, M Nefedov, B Zhu, AD Baxevanis, TG Wolfsberg, GE Crawford, E Holt, TJ Vasicek, D Zhou, S Luo, ED Green, GG Bouffard, EH Margulies, ME Portnoy, NF Hansen, PJ Thomas, JC Mcdowell, B Maskeri, AC Young, None, The ENCODE (ENCyclopedia of DNA elements) Project Science. ,vol. 306, pp. 636- 640 ,(2004) , 10.1126/SCIENCE.1105136
Paul Muir, Shantao Li, Shaoke Lou, Daifeng Wang, Daniel J Spakowicz, Leonidas Salichos, Jing Zhang, George M. Weinstock, Farren Isaacs, Joel Rozowsky, Mark Gerstein, The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biology. ,vol. 17, pp. 53- 53 ,(2016) , 10.1186/S13059-016-0917-0