Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

作者: Amit Kawalia , Susanne Motameny , Stephan Wonczak , Holger Thiele , Lech Nieroda

DOI: 10.1371/JOURNAL.PONE.0126321

关键词:

摘要: Next generation sequencing (NGS) has been a great success and is now standard method of research in the life sciences. With this technology, dozens whole genomes or hundreds exomes can be sequenced rather short time, producing huge amounts data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order run fast, automated workflows implemented on high performance computers state art. While providing sufficient compute power storage meet NGS challenge, computing (HPC) systems require special care when utilized for throughput processing. This especially true if HPC system shared by different users. Here, stability, robustness maintainability as important speed throughput. To achieve all aims, dedicated solutions have developed. paper, we present tricks twists that implementation our exome processing workflow. It may serve guideline other analysis projects using similar infrastructure. The code implementing provided supporting information files.

参考文章(35)
Bernd A. Neubauer, Dennis Lal, Eva M. Reinthaler, Julian Schubert, Hiltrud Muhle, Erik Riesch, Gerhard Kluger, Kamel Jabbari, Amit Kawalia, Christine Bäumel, Hans Holthausen, Andreas Hahn, Martha Feucht, Birgit Neophytou, Edda Haberlandt, Felicitas Becker, Janine Altmüller, Holger Thiele, Johannes R. Lemke, Holger Lerche, Peter Nürnberg, Thomas Sander, Yvonne Weber, Fritz Zimprich, DEPDC5 mutations in genetic focal epilepsies of childhood Annals of Neurology. ,vol. 75, pp. 788- 792 ,(2014) , 10.1002/ANA.24127
Jeffrey Parvin, Terry Camerlengo, Pearlly Yan, Kun Huang, Raghuram Onti-Srinivasan, Tim Huang, Hatice Gulcin Ozer, From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science. ,vol. 2012, pp. 1- 10 ,(2012)
Andy B. Yoo, Morris A. Jette, Mark Grondona, SLURM: Simple Linux Utility for Resource Management job scheduling strategies for parallel processing. pp. 44- 60 ,(2003) , 10.1007/10968987_3
Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Extensible Markup Language (XML). World Wide Web. ,vol. 2, pp. 27- 66 ,(1997)
Davor Lessel, Bruno Vaz, Swagata Halder, Paul J Lockhart, Ivana Marinovic-Terzic, Jaime Lopez-Mosqueda, Melanie Philipp, Joe C H Sim, Katherine R Smith, Judith Oehler, Elisa Cabrera, Raimundo Freire, Kate Pope, Amsha Nahid, Fiona Norris, Richard J Leventer, Martin B Delatycki, Gotthold Barbi, Simon von Ameln, Josef Högel, Marina Degoricija, Regina Fertig, Martin D Burkhalter, Kay Hofmann, Holger Thiele, Janine Altmüller, Gudrun Nürnberg, Peter Nürnberg, Melanie Bahlo, George M Martin, Cora M Aalfs, Junko Oshima, Janos Terzic, David J Amor, Ivan Dikic, Kristijan Ramadan, Christian Kubisch, Mutations in SPRTN cause early onset hepatocellular carcinoma, genomic instability and progeroid features Nature Genetics. ,vol. 46, pp. 1239- 1244 ,(2014) , 10.1038/NG.3103
Prateek Kumar, Steven Henikoff, Pauline C Ng, Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols. ,vol. 4, pp. 1073- 1081 ,(2009) , 10.1038/NPROT.2009.86
Faraz Hach, Fereydoun Hormozdiari, Can Alkan, Farhad Hormozdiari, Inanc Birol, Evan E Eichler, S Cenk Sahinalp, mrsFAST: a cache-oblivious algorithm for short-read mapping Nature Methods. ,vol. 7, pp. 576- 577 ,(2010) , 10.1038/NMETH0810-576
F. Buket Basmanav, Ana-Maria Oprisoreanu, Sandra M. Pasternack, Holger Thiele, Günter Fritz, Jörg Wenzel, Leopold Größer, Maria Wehner, Sabrina Wolf, Christina Fagerberg, Anette Bygum, Janine Altmüller, Arno Rütten, Laurent Parmentier, Laila El Shabrawi-Caelen, Christian Hafner, Peter Nürnberg, Roland Kruse, Susanne Schoch, Sandra Hanneken, Regina C. Betz, Mutations in POGLUT1, Encoding Protein O-Glucosyltransferase 1, Cause Autosomal-Dominant Dowling-Degos Disease American Journal of Human Genetics. ,vol. 94, pp. 135- 143 ,(2014) , 10.1016/J.AJHG.2013.12.003
Daniel F. Gudbjartsson, Kristjan Jonasson, Michael L. Frigge, Augustine Kong, Allegro, a new computer program for multipoint linkage analysis Nature Genetics. ,vol. 25, pp. 12- 13 ,(2000) , 10.1038/75514
Matthias Ebbinghaus, R Oliver Goral, Tommy Stödberg, J Christopher Hennings, Markus Bergmann, Janine Altmüller, Holger Thiele, Andrea Wetzel, Peter Nürnberg, Vincent Timmerman, Peter De Jonghe, Robert Blum, Hans-Georg Schaible, Joachim Weis, Stefan H Heinemann, Christian A Hübner, Ingo Kurth, Enrico Leipold, Lutz Liebmann, G Christoph Korenke, Theresa Heinrich, Sebastian Gießelmann, Jonathan Baets, A de novo gain-of-function mutation in SCN11A causes loss of pain perception Nature Genetics. ,vol. 45, pp. 1399- 1404 ,(2013) , 10.1038/NG.2767