Variant Callers for Next-Generation Sequencing Data: A Comparison Study

作者: Xiangtao Liu , Shizhong Han , Zuoheng Wang , Joel Gelernter , Bao-Zhu Yang

DOI: 10.1371/JOURNAL.PONE.0075619

关键词: Whole genome sequencingComputational biologyExome sequencingSanger sequencingGenome-wide association studyExomeGold standard (test)DNA sequencingGeneticsComparative genomicsBiology

摘要: Next generation sequencing (NGS) has been leading the genetic study of human disease into an era unprecedented productivity. Many bioinformatics pipelines have developed to call variants from NGS data. The performance these depends crucially on variant caller used and calling strategies implemented. We studied four prevailing callers, SAMtools, GATK, glftools Atlas2, using single-sample multiple-sample variant-calling strategies. Using same aligner, BWA, we built three applied whole exome data taken 20 individuals. obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis then Sanger as a "gold-standard" method resolve discrepancies selected regions high discordance. Finally, compared sensitivity known simulated genome sequence gold standard. Overall, calling, called were highly consistent across callers pairwise overlapping rate was about 0.9. Compared with other GATK had highest rediscovery (0.9969) specificity (0.99996), Ti/Tv ratio out closest expected value 3.02. Multiple-sample increased sensitivity. Results suggested that outperformed SAMtools glfSingle in sensitivity, especially low coverage Further, discrepant evaluated sequencing, versus array more accurate, although average overall genotype consistency 95.87% 99.82%, respectively. In conclusion, showed several advantages over general purpose analyses. perform very well.

参考文章(22)
K. Wang, M. Li, H. Hakonarson, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data Nucleic Acids Research. ,vol. 38, ,(2010) , 10.1093/NAR/GKQ603
Suying Bao, Rui Jiang, WingKeung Kwan, BinBin Wang, Xu Ma, You-Qiang Song, Evaluation of next-generation sequencing software in mapping and assembly Journal of Human Genetics. ,vol. 56, pp. 406- 414 ,(2011) , 10.1038/JHG.2011.43
Kyle D Bemis, Livia Eberlin, Christina Ferreira, R Graham Cooks, Olga Vitek, Spatial segmentation and feature selection for desi imaging mass spectrometry data with spatially-aware sparse clustering BMC Bioinformatics. ,vol. 13, pp. 1- 3 ,(2012) , 10.1186/1471-2105-13-S18-A8
Chuck Litecky, Bipin Prabhakar, Kirk Arnett, The size of the IT job market Communications of The ACM. ,vol. 51, pp. 107- 109 ,(2008) , 10.1145/1330311.1330331
Swetansu Pattnaik, Srividya Vaidyanathan, Durgad G. Pooja, Sa Deepak, Binay Panda, Customisation of the Exome Data Analysis Pipeline Using a Combinatorial Approach PLOS ONE. ,vol. 7, ,(2012) , 10.1371/JOURNAL.PONE.0030080
Danny Challis, Jin Yu, Uday S Evani, Andrew R Jackson, Sameer Paithankar, Cristian Coarfa, Aleksandar Milosavljevic, Richard A Gibbs, Fuli Yu, An integrative variant analysis suite for whole exome next-generation sequencing data BMC Bioinformatics. ,vol. 13, pp. 8- 8 ,(2012) , 10.1186/1471-2105-13-8
Michael J. Bamshad, Sarah B. Ng, Abigail W. Bigham, Holly K. Tabor, Mary J. Emond, Deborah A. Nickerson, Jay Shendure, Exome sequencing as a tool for Mendelian disease gene discovery Nature Reviews Genetics. ,vol. 12, pp. 745- 755 ,(2011) , 10.1038/NRG3031
H. Li, R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform Bioinformatics. ,vol. 25, pp. 1754- 1760 ,(2009) , 10.1093/BIOINFORMATICS/BTP324
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, , The Sequence Alignment/Map format and SAMtools Bioinformatics. ,vol. 25, pp. 2078- 2079 ,(2009) , 10.1093/BIOINFORMATICS/BTP352
H. Li, J. Ruan, R. Durbin, Mapping short DNA sequencing reads and calling variants using mapping quality scores Genome Research. ,vol. 18, pp. 1851- 1858 ,(2008) , 10.1101/GR.078212.108