作者: David J. Cutler , Michael E. Zwick , Minerva M. Carrasquillo , Christopher T. Yohn , Katherine P. Tobin
DOI: 10.1101/GR.197201
关键词:
摘要: The central goal of human genetics is to identify, characterize and ultimately understand the specific DNA variants that contribute phenotypes in general, disease particular (Lander Schork 1994; Chakravarti 1999; Zwick et al. 2000, 2001; On-line Mendelian Inheritance Man 2001). genetic approach this problem is, principle, straightforward. First, we identify individuals showing phenotypic variation for trait interest. Second, genotype variants, such as microsatellites or SNPs, all a study. Third, perform appropriate statistical tests any correlated with phenotype. Finally, if are found, additional experiments demonstrate causal relationship. Step two poses question: What should be examined? answer question must balance technological practical considerations. Nevertheless, best worlds, researcher would able determine every base sample, complete resequencing entire genome under No technology currently exists do an economical manner. Moreover, used purpose capable extraordinary accuracy. Nucleotide diversity general population ∼8 × 10−4 per site (Cargill Halushka International SNP Map Working [TISMW] Group Venter study). This implies randomly selected chromosome will differ from reference sequence at ∼8 10,000 bases. Now, imagine allowed one rapidly inexpensively individual nucleotide interest accuracy 99.9%. Such remarkable, but insufficient. only 99.9% 10 errors Because true rate eight 10,000, 55.5% identified errors. unacceptably high. error needs much lower. Microarrays inherently parallel devices offer promise determining genotypes limited level effort (Fodor 1991; Southern 1992; Pease McGall 1996; Lipshutz 1999). Variation Detection Arrays (VDAs) manufactured by Affymetrix have been end success (Chee Hacia 1996, 1998a,b, 1999, 2000; Collins Wang 1998). Unfortunately, it has also reported between 12% 45% detected false indicates VDAs are, on average, 99.99% 99.93% accurate. Although microarrays may be, insufficiently accurate, certainly possible large fraction calls fact, more accurate than smaller less accurate. here construct objective framework distinguish can made those reliable. need build not new idea (Southern 1992) objectives strive some accomplishments Green colleagues (Nickerson 1997; Ewing 1998; Gordon Rieder 1998) automated sequencing, namely assignment quality score larger likely colleagues, done even more; phred provides increases increasing accuracy, direct estimate probability call correct. Researchers performing sequencing routinely rely these scores (Ewing 1998), conjunction certain other neighborhood rules (Altshuler Mullikin 2000), achieve extremely high discovery (T.I.S.M.W. work attempts same task. An developed assign each VDA score. Certain simple applied, sites which extraordinarily confidence placed distinguished reliable sites. In contrast employ haploid targets method applied both diploid targets. We system ABACUS (from Adaptive Background Calling Scheme, see below) show that, greater 99.9999% achieved >80% VDA.