VarBin, a novel method for classifying true and false positive variants in NGS data

作者: Jacob Durtschi , Rebecca L Margraf , Emily M Coonrod , Kalyan C Mallempati , Karl V Voelkerding

DOI: 10.1186/1471-2105-14-S13-S2

关键词: Sequence analysisGeneticsExomeGenomeGenotypeExome sequencingBinProbandBiologySanger sequencing

摘要: Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions variants find only the one few causative variant(s). Sequencing alignment errors create "false positive" variants, which are often retained in variant process. Methods remove false positive retain many variants. This report presents VarBin, a method prioritize based on likelihood prediction. VarBin uses Genome Analysis Toolkit calling software calculate variant-to-wild type genotype ratio at each change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio depth (PLRD) was used segregate into 4 Bins with Bin 1 most likely true positive. PLRD values were calculated proband interest 41 additional HiSeq, whole samples (proband's family unrelated samples). At sites without apparent error, wild type/non-variant calls cluster near -3 typically above 10 PLRD. Sites systematic problems (evident quality scores biases as well displayed iGV viewer) tend have higher more variable values. Depending separation proband's value from background same position, method's classification is assigned (Bin 4). To assess performance, Sanger performed 98 samples. True confirmed 97% 30% 2, 0% 3/Bin 4. These data indicate that correctly classifies majority 3/4 contained "uncertain" 2 both Future work will further differentiate 2.

参考文章(23)
Karl V. Voelkerding, Jacob D. Durtschi, David C. Pattison, Rong Mao, Rebecca L. Margraf, Shale Dames, Jack E. Stephens, Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development. Journal of biomolecular techniques. ,vol. 21, pp. 126- 140 ,(2010)
C. Ledergerber, C. Dessimoz, Base-calling for next-generation sequencing platforms Briefings in Bioinformatics. ,vol. 12, pp. 489- 497 ,(2011) , 10.1093/BIB/BBQ077
Emily M. Coonrod, Jacob D. Durtschi, Rebecca L. Margraf, Karl V. Voelkerding, Developing Genome and Exome Sequencing for Candidate Gene Identification in Inherited Disorders: An Integrated Technical and Bioinformatics Approach Archives of Pathology & Laboratory Medicine. ,vol. 137, pp. 415- 433 ,(2013) , 10.5858/ARPA.2012-0107-RA
Todd J. Treangen, Steven L. Salzberg, Repetitive DNA and next-generation sequencing: computational challenges and solutions Nature Reviews Genetics. ,vol. 13, pp. 36- 46 ,(2012) , 10.1038/NRG3117
Karin V. Fuentes Fajardo, David Adams, Christopher E. Mason, Murat Sincan, Cynthia Tifft, Camilo Toro, Cornelius F Boerkoel, William Gahl, Thomas Markello, , Detecting false-positive signals in exome sequencing† Human Mutation. ,vol. 33, pp. 609- 613 ,(2012) , 10.1002/HUMU.22033
David R. Adams, Murat Sincan, Karin Fuentes Fajardo, James C. Mullikin, Tyler M. Pierson, Camilo Toro, Cornelius F. Boerkoel, Cynthia J. Tifft, William A. Gahl, Tom C. Markello, Analysis of DNA sequence variants detected by high‐throughput sequencing Human Mutation. ,vol. 33, pp. 599- 608 ,(2012) , 10.1002/HUMU.22035
V. Bansal, O. Harismendy, R. Tewhey, S. S. Murray, N. J. Schork, E. J. Topol, K. A. Frazer, Accurate detection and genotyping of SNPs utilizing population sequencing data Genome Research. ,vol. 20, pp. 537- 545 ,(2010) , 10.1101/GR.100040.109
Alexander Luedtke, Scott Powers, Ashley Petersen, Alexandra Sitarik, Airat Bekmetjev, Nathan L Tintle, Evaluating Methods for the Analysis of Rare Variants in Sequence Data BMC Proceedings. ,vol. 5, pp. 1- 8 ,(2011) , 10.1186/1753-6561-5-S9-S119
IRINA ABNIZOVA, STEVEN LEONARD, TOM SKELLY, ANDY BROWN, DAVID JACKSON, MARINA GOURTOVAIA, GUOYING QI, RENE TE BOEKHORST, NADEEM FARUQUE, KEVIN LEWIS, TONY COX, Analysis of context-dependent errors for illumina sequencing Journal of Bioinformatics and Computational Biology. ,vol. 10, pp. 1241005- 1241005 ,(2012) , 10.1142/S0219720012410053
Patrick Flaherty, Georges Natsoulis, Omkar Muralidharan, Mark Winters, Jason Buenrostro, John Bell, Sheldon Brown, Mark Holodniy, Nancy Zhang, Hanlee P. Ji, Ultrasensitive detection of rare mutations using next-generation targeted resequencing Nucleic Acids Research. ,vol. 40, ,(2012) , 10.1093/NAR/GKR861