作者: Xiangtao Liu , Shizhong Han , Zuoheng Wang , Joel Gelernter , Bao-Zhu Yang
DOI: 10.1371/JOURNAL.PONE.0075619
关键词: Whole genome sequencing 、 Computational biology 、 Exome sequencing 、 Sanger sequencing 、 Genome-wide association study 、 Exome 、 Gold standard (test) 、 DNA sequencing 、 Genetics 、 Comparative genomics 、 Biology
摘要: Next generation sequencing (NGS) has been leading the genetic study of human disease into an era unprecedented productivity. Many bioinformatics pipelines have developed to call variants from NGS data. The performance these depends crucially on variant caller used and calling strategies implemented. We studied four prevailing callers, SAMtools, GATK, glftools Atlas2, using single-sample multiple-sample variant-calling strategies. Using same aligner, BWA, we built three applied whole exome data taken 20 individuals. obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis then Sanger as a "gold-standard" method resolve discrepancies selected regions high discordance. Finally, compared sensitivity known simulated genome sequence gold standard. Overall, calling, called were highly consistent across callers pairwise overlapping rate was about 0.9. Compared with other GATK had highest rediscovery (0.9969) specificity (0.99996), Ti/Tv ratio out closest expected value 3.02. Multiple-sample increased sensitivity. Results suggested that outperformed SAMtools glfSingle in sensitivity, especially low coverage Further, discrepant evaluated sequencing, versus array more accurate, although average overall genotype consistency 95.87% 99.82%, respectively. In conclusion, showed several advantages over general purpose analyses. perform very well.