作者: I. Ionita-Laza , C. Lange , N. M. Laird
关键词:
摘要: The different genetic variation discovery projects (The SNP Consortium, the International HapMap Project, 1000 Genomes etc.) aim to identify as much possible of underlying in various human populations. question we address this article is how many new variants are yet be found. This an instance species problem ecology, where goal estimate number a closed population. We use parametric beta-binomial model that allows us calculate expected with desired minimum frequency discovered dataset individuals specified size. method can also used predict necessary sequence order capture all (or fraction of) frequency. apply three datasets: ENCODE dataset, SeattleSNPs and National Institute Environmental Health Sciences SNPs dataset. Consistent previous descriptions, our results show African population most diverse terms exist, Asian populations least diverse, European in-between. In addition, clear distinction between Chinese Japanese populations, being less diverse. To find common (frequency at 1%) need sequenced small (∼350) does not differ among populations; data that, subject accuracy, Project likely these high proportion rarer ones 0.1 1%). reveal rule diminishing returns: (∼150) sufficient 80% 0.1%, while larger (> 3,000 individuals) those variants. Finally, higher diversity environmental response genes compared average genome, especially