作者: ALICIA R. MARTIN , GERARD TSE , CARLOS D. BUSTAMANTE , EIMEAR E. KENNY
DOI: 10.1142/9789814583220_0024
关键词:
摘要: A striking finding from recent large-scale sequencing efforts is that the vast majority of variants in human genome are rare and found within single populations or lineages. These observations hold important implications for design next round disease variant discovery efforts—if genetic influence risk follow same trend, then we expect to see population-specific associations require large samples sizes detection. To address this challenge, due still prohibitive cost cohorts, researchers have developed a new generation low-cost genotyping arrays assay variation previously identified exome studies. Genotyping approaches rely not only on directly observing variants, but also phasing imputation methods use publicly available reference panels infer unobserved study cohort. Rare intentionally enriched likely be causing, here ability first commercially array (the Illumina Infinium HumanExome BeadChip) tag other potentially damaging molecularly assayed. Using full sequence data chromosome 22 phase I 1000 Genomes Project, evaluate three (BEAGLE, MaCH-Admix, SHAPEIT2/IMPUTE2) with under varied panel sizes, LD structures via population differences. We find more accurate across both common than all allele frequencies, including alleles. least African populations, accuracy substantially improved when included panel. Depending goals GWAS researchers, our results will aid budget decisions by helping determine whether money best spent genomes smaller sample larger and/or imputing SNPs, some combination two.