Detailed comparison of two popular variant calling packages for exome and targeted exon studies

作者: Charles D. Warden , Aaron W. Adamson , Susan L. Neuhausen , Xiwei Wu

DOI: 10.7717/PEERJ.600

关键词:

摘要: The Genome Analysis Toolkit (GATK) is commonly used for variant calling of single nucleotide polymorphisms (SNPs) and small insertions deletions (indels) from short-read sequencing data aligned against a reference genome. There have been number comparisons GATK, but an equally comprehensive comparison VarScan not yet performed. More specifically, we compare (1) the effects different pre-processing steps prior to with both GATK VarScan, (2) variants called increasingly conservative parameters, (3) filtered unfiltered calls (for UnifiedGenotyper HaplotypeCaller). Variant was performed on three datasets (1 targeted exon dataset 2 exome datasets), each approximately dozen subjects. In most cases, (e.g., indel realignment quality score base recalibration using GATK) had only modest impact calls, importance varied between callers. Based upon concordance statistics presented in this study, recommend users focus "high-quality" by filtering out flagged as low-quality. We also found that running set parameters (referred "VarScan-Cons") resulted reproducible list variants, high (>97%) high-quality HaplotypeCaller. These result decreased sensitivity, VarScan-Cons could still recover 84-88% SNPs datasets. This study provides limited evidence has false positive rate among novel (relative SNPs) HaplotypeCaller increased indels indels). broadly, believe metrics can be useful assessing context specific experimental design. As example, are two additional

参考文章(51)
Francesco Lescai, Elena Marasco, Chiara Bacchelli, Philip Stanier, Vilma Mantovani, Philip Beales, Identification and validation of loss of function variants in clinical contexts. Molecular Genetics & Genomic Medicine. ,vol. 2, pp. 58- 63 ,(2014) , 10.1002/MGG3.42
Joseph F. Boland, Charles C. Chung, David Roberson, Jason Mitchell, Xijun Zhang, Kate M. Im, Ji He, Stephen J. Chanock, Meredith Yeager, Michael Dean, The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing Human Genetics. ,vol. 132, pp. 1153- 1163 ,(2013) , 10.1007/S00439-013-1321-4
B. Nevado, M. Perez-Enciso, Pipeliner: software to evaluate the performance of bioinformatics pipelines for next‐generation resequencing Molecular Ecology Resources. ,vol. 15, pp. 99- 106 ,(2015) , 10.1111/1755-0998.12286
D. I. Lou, J. A. Hussmann, R. M. McBee, A. Acevedo, R. Andino, W. H. Press, S. L. Sawyer, High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing Proceedings of the National Academy of Sciences of the United States of America. ,vol. 110, pp. 19872- 19877 ,(2013) , 10.1073/PNAS.1319590110
H. Li, R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform Bioinformatics. ,vol. 25, pp. 1754- 1760 ,(2009) , 10.1093/BIOINFORMATICS/BTP324
Anthony Youzhi Cheng, Yik-Ying Teo, Rick Twee-Hee Ong, Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals. Bioinformatics. ,vol. 30, pp. 1707- 1713 ,(2014) , 10.1093/BIOINFORMATICS/BTU067
Mehdi Pirooznia, Melissa Kramer, Jennifer Parla, Fernando S Goes, James B Potash, W McCombie, Peter P Zandi, Validation and assessment of variant calling pipelines for next-generation sequencing Human Genomics. ,vol. 8, pp. 14- 14 ,(2014) , 10.1186/1479-7364-8-14
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, , The Sequence Alignment/Map format and SAMtools Bioinformatics. ,vol. 25, pp. 2078- 2079 ,(2009) , 10.1093/BIOINFORMATICS/BTP352
A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M. A. DePristo, The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data Genome Research. ,vol. 20, pp. 1297- 1303 ,(2010) , 10.1101/GR.107524.110