作者: Braxton D Mitchell , Myriam Fornage , Patrick F McArdle , Yu-Ching Cheng , Sara L Pulit
关键词:
摘要: Genome-wide association studies (GWAS) are widely applied to identify susceptibility loci for a variety of diseases using genotyping arrays that interrogate known polymorphisms throughout the genome. A particular strength GWAS is it unbiased with respect specific genomic elements (e.g., coding or regulatory regions genes), and has revealed important associations would have never been suspected based on prior knowledge assumptions. To date, discovered SNPs associated complex human traits tend small effect sizes, requiring very large sample sizes achieve robust statistical power. address these issues, number efficient strategies emerged conducting GWAS, including combining study results across multiple meta-analysis, collecting cases through electronic health records, samples collected from other as controls already genotyped made publicly available deposition de-identified data into dbGaP EGA).In certain scenarios, may be attractive use divert resources standardized collection, phenotyping, only. This strategy, however, requires careful attention paid choice “public controls” comparability genetic between public ensure any allele frequency differences observed groups attributable locus-specific effects rather than systematic bias due poor matching (population stratification) differential genotype calling (batch effects).The goal this paper describe some potential pitfalls in previously control data. We focus considerations related groups, different platforms, approaches deal population stratification when platforms.