The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process

作者: Verena Heinrich , Jens Stange , Thorsten Dickhaus , Peter Imkeller , Ulrike Krüger

DOI: 10.1093/NAR/GKR1073

关键词:

摘要: With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate deeper understanding distribution variant call frequencies at heterozygous loci in NGS data sets prerequisite for sensitive detection. We model crucial steps an protocol as stochastic branching process and derive mathematical framework alleles before measurement sequencing. confirm our theoretical results by analyzing technical replicates human exome variance allele higher than simple binomial distribution. Due to this high variance, mutation callers relying distributed priors are less deviate strongly from mean frequency. Our also indicate error rates can reduced greater degree increasing depth.

参考文章(18)
Michael Nothnagel, Alexander Herrmann, Andreas Wolf, Stefan Schreiber, Matthias Platzer, Reiner Siebert, Michael Krawczak, Jochen Hampe, Technology-specific error signatures in the 1000 Genomes Project data. Human Genetics. ,vol. 130, pp. 505- 516 ,(2011) , 10.1007/S00439-011-0971-3
PN Robinson, P Krawitz, S Mundlos, Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Clinical Genetics. ,vol. 80, pp. 127- 132 ,(2011) , 10.1111/J.1399-0004.2011.01713.X
David R Bentley, Shankar Balasubramanian, Harold P Swerdlow, Geoffrey P Smith, John Milton, Clive G Brown, Kevin P Hall, Dirk J Evers, Colin L Barnes, Helen R Bignell, Jonathan M Boutell, Jason Bryant, Richard J Carter, R Keira Cheetham, Anthony J Cox, Darren J Ellis, Michael R Flatbush, Niall A Gormley, Sean J Humphray, Leslie J Irving, Mirian S Karbelashvili, Scott M Kirk, Heng Li, Xiaohai Liu, Klaus S Maisinger, Lisa J Murray, Bojan Obradovic, Tobias Ost, Michael L Parkinson, Mark R Pratt, Isabelle MJ Rasolonjatovo, Mark T Reed, Roberto Rigatti, Chiara Rodighiero, Mark T Ross, Andrea Sabot, Subramanian V Sankar, Aylwyn Scally, Gary P Schroth, Mark E Smith, Vincent P Smith, Anastassia Spiridou, Peta E Torrance, Svilen S Tzonev, Eric H Vermaas, Klaudia Walter, Xiaolin Wu, Lu Zhang, Mohammed D Alam, Carole Anastasi, Ify C Aniebo, David MD Bailey, Iain R Bancarz, Saibal Banerjee, Selena G Barbour, Primo A Baybayan, Vincent A Benoit, Kevin F Benson, Claire Bevis, Phillip J Black, Asha Boodhun, Joe S Brennan, John A Bridgham, Rob C Brown, Andrew A Brown, Dale H Buermann, Abass A Bundu, James C Burrows, Nigel P Carter, Nestor Castillo, Maria Chiara E. Catenazzi, Simon Chang, R Neil Cooley, Natasha R Crake, Olubunmi O Dada, Konstantinos D Diakoumakos, Belen Dominguez-Fernandez, David J Earnshaw, Ugonna C Egbujor, David W Elmore, Sergey S Etchin, Mark R Ewan, Milan Fedurco, Louise J Fraser, Karin V Fuentes Fajardo, W Scott Furey, David George, Kimberley J Gietzen, Colin P Goddard, George S Golda, Philip A Granieri, David E Green, David L Gustafson, Nancy F Hansen, Kevin Harnish, Christian D Haudenschild, Narinder I Heyer, Matthew M Hims, Johnny T Ho, Adrian M Horgan, Katya Hoschler, Steve Hurwitz, Denis V Ivanov, Maria Q Johnson, Terena James, TA Huw Jones, Gyoung-Dong Kang, Tzvetana H Kerelska, Alan D Kersey, Irina Khrebtukova, Alex P Kindwall, Zoya Kingsbury, Paula I Kokko-Gonzales, Anil Kumar, Marc A Laurent, Cynthia T Lawley, Sarah E Lee, Xavier Lee, Arnold K Liao, Jennifer A Loch, Mitch Lok, Shujun Luo, Radhika M Mammen, John W Martin, Patrick G McCauley, Paul McNitt, Parul Mehta, Keith W Moon, Joe W Mullens, Taksina Newington, Zemin Ning, Bee Ling Ng, Sonia M Novo, Michael J O’Neill, Mark A Osborne, Andrew Osnowski, Omead Ostadan, Lambros L Paraschos, Lea Pickering, Andrew C Pike, Alger C Pike, D Chris Pinkard, Daniel P Pliskin, Joe Podhasky, Victor J Quijano, Come Raczy, Vicki H Rae, Stephen R Rawlings, Ana Chiva Rodriguez, Phyllida M Roe, None, Accurate whole human genome sequencing using reversible terminator chemistry Nature. ,vol. 456, pp. 53- 59 ,(2008) , 10.1038/NATURE07517
Sarah B Ng, Kati J Buckingham, Choli Lee, Abigail W Bigham, Holly K Tabor, Karin M Dent, Chad D Huff, Paul T Shannon, Ethylin Wang Jabs, Deborah A Nickerson, Jay Shendure, Michael J Bamshad, Exome sequencing identifies the cause of a Mendelian disorder Nature Genetics. ,vol. 42, pp. 30- 35 ,(2010) , 10.1038/NG.499
C. J. Bell, D. L. Dinwiddie, N. A. Miller, S. L. Hateley, E. E. Ganusova, J. Mudge, R. J. Langley, L. Zhang, C. C. Lee, F. D. Schilkey, V. Sheth, J. E. Woodward, H. E. Peckham, G. P. Schroth, R. W. Kim, S. F. Kingsmore, Carrier Testing for Severe Childhood Recessive Diseases by Next-Generation Sequencing Science Translational Medicine. ,vol. 3, ,(2011) , 10.1126/SCITRANSLMED.3001756
Kim D Pruitt, Jennifer Harrow, Rachel A Harte, Craig Wallin, Mark Diekhans, Donna R Maglott, Steve Searle, Catherine M Farrell, Jane E Loveland, Barbara J Ruef, Elizabeth Hart, Marie-Marthe Suner, Melissa J Landrum, Bronwen Aken, Sarah Ayling, Robert Baertsch, Julio Fernandez-Banet, Joshua L Cherry, Val Curwen, Michael DiCuccio, Manolis Kellis, Jennifer Lee, Michael F Lin, Michael Schuster, Andrew Shkeda, Clara Amid, Garth Brown, Oksana Dukhanina, Adam Frankish, Jennifer Hart, Bonnie L Maidak, Jonathan Mudge, Michael R Murphy, Terence Murphy, Jeena Rajan, Bhanu Rajput, Lillian D Riddick, Catherine Snow, Charles Steward, David Webb, Janet A Weber, Laurens Wilming, Wenyu Wu, Ewan Birney, David Haussler, Tim Hubbard, James Ostell, Richard Durbin, David Lipman, The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes Genome Research. ,vol. 19, pp. 1316- 1323 ,(2009) , 10.1101/GR.080531.108
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, , The Sequence Alignment/Map format and SAMtools Bioinformatics. ,vol. 25, pp. 2078- 2079 ,(2009) , 10.1093/BIOINFORMATICS/BTP352
H. Li, J. Ruan, R. Durbin, Mapping short DNA sequencing reads and calling variants using mapping quality scores Genome Research. ,vol. 18, pp. 1851- 1858 ,(2008) , 10.1101/GR.078212.108
Jacob F. Degner, John C. Marioni, Athma A. Pai, Joseph K. Pickrell, Everlyne Nkadori, Yoav Gilad, Jonathan K. Pritchard, Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data Bioinformatics. ,vol. 25, pp. 3207- 3212 ,(2009) , 10.1093/BIOINFORMATICS/BTP579
Peter Krawitz, Christian Rödelsperger, Marten Jäger, Luke Jostins, Sebastian Bauer, Peter N. Robinson, Microindel detection in short-read sequence data Bioinformatics. ,vol. 26, pp. 722- 729 ,(2010) , 10.1093/BIOINFORMATICS/BTQ027