A “Quality-First” Credo for the Human Genome Project

作者: Maynard Olson , Phil Green

DOI: 10.1101/GR.8.5.414

关键词: GenBankPositional cloningSequence (medicine)Path (graph theory)Sequence assemblyMachine learningReference genomeArtificial intelligenceBiologyReference dataHuman genome

摘要: The Human Genome Project is lurching toward large-scale genomic sequencing. Although we find the arguments in favor of this path compelling, there remain sobering uncertainties about cost producing sequence on a gigabase-pair scale, rate at which needed sequencing capacity can be developed, and extent to compromises will required quality final product. Of course, it these very that make human genome scientific managerial challenge worthy committed attention many hard-working talented people. Policy decisions are now being made greatly affect how talent deployed during years ahead. We argue here, both grounds, essential adopt ‘‘quality-first’’ credo. Scientific for rooted view used. most common uses reference sequences involve comparisons with other data. A vast number sequences, derived from sources— nonhuman—will compared by future scientists. should not prejudge either comparison methods or questions address. Our goal produce sufficiently good rarely fail misleading results because inaccurate incomplete. differences detected should, nearly all cases, reflect real biological effects limitations experimental theoretical methods—not errors Viewed perspective, commonly accepted base-pair accuracy 0.9999 minimum target. Current data suggest two randomly chosen genomes likely show similarity 0.999 level. Hence, even 10 error rate, order 10% discrepancies arise intraspecies due Each such artifactual discrepancy lower efficiency diminish discrimination power inquiries. Base-pair only one dimension quality. Indeed, current technology least problematic one. issues contiguity, clone validation, assembly checking more challenging crucial perspective users. Contiguity has proven particularly elusive. Highly fragmented sequence, as virtually currently GenBank, difficult use validate. At contiguity standard 100 kb, substantial portion GenBank entries meet, genes single piece. 1-Mb standard, candidate regions associated positional cloning projects typically still contain gaps. Gaps leave users uncertain precise origins content particular but they also Problems types accumulate adjacent gaps self-consistency tests validation depends regions. Gap sizes measure reliably often prove larger than expected. For segments appreciably below 1 Mb size, relative orientations segments—and ordering segments—can determine. Clone involves demonstrating recombinant DNA molecule faithful replica was derived. present, practical method validating large-insert clones check their consistency overlapping same region. This process carried out various resolutions, ranging restrictionfragment mapping complete aberrations occur reproducibly cannot methods. However, experience indicates rare. serious problem carry rigorously. Because every libraries unique pair end points given come haplotypes, confounded haplotype differences. Assembly ensure programs have correctly melded individual tracts form composite sequence. widely used compare restriction digests pattern fragment predicted assembled suspect undetected errors, some trends area bear watching. example, increasing reliance shotgun sampling relatively large (e.g., 150to 200kb BAC clones) increases likelihood makes them detect. As brief survey indicates, significant technical challenges assessing must met. In addition rationale Corresponding author. E-MAIL phg@u.washington.edu; FAX (206) 6857344. Insight/Outlook

参考文章(0)