4P: fast computing of population genetics statistics from large DNA polymorphism panels.

作者: Andrea Benazzo , Alex Panziera , Giorgio Bertorelle

DOI: 10.1002/ECE3.1261

关键词: Parallel processing (DSP implementation)Pipeline (computing)Data miningServerComputer scienceSource codeStatisticsUnixFile formatMulti-core processorSet (abstract data type)

摘要: Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, parallel computation simple statistics within between populations from large panels polymorphic sites is not yet available, making exploratory analyses a set or subset very laborious task. Here, we present 4P (parallel processing polymorphism panels), stand-alone software program rapid genetic variation (including joint frequency spectrum) millions variants in multiple individuals populations. It handles standard input file format commonly used to store empirical simulation experiments. The computational performance was evaluated using SNP (single nucleotide polymorphism) datasets human genomes obtained by simulations. faster much than other comparable programs, impact computing multicore computers servers evident. useful tool biologists who need computer run genomic data. also particularly suitable analyze sets produced Unix, Windows, MacOs versions are provided, as well source code easier pipeline implementations.

参考文章(23)
M. Nei, Analysis of Gene Diversity in Subdivided Populations Proceedings of the National Academy of Sciences of the United States of America. ,vol. 70, pp. 3321- 3323 ,(1973) , 10.1073/PNAS.70.12.3321
Laurent Excoffier, Isabelle Dupanloup, Emilia Huerta-Sánchez, Vitor C. Sousa, Matthieu Foll, Robust Demographic Inference from Genomic and SNP Data PLoS Genetics. ,vol. 9, pp. e1003905- ,(2013) , 10.1371/JOURNAL.PGEN.1003905
B. S. Weir, C. Clark Cockerham, ESTIMATING F -STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE Evolution. ,vol. 38, pp. 1358- 1370 ,(1984) , 10.1111/J.1558-5646.1984.TB05657.X
John W Davey, Paul A Hohenlohe, Paul D Etter, Jason Q Boone, Julian M Catchen, Mark L Blaxter, Genome-wide genetic marker discovery and genotyping using next-generation sequencing Nature Reviews Genetics. ,vol. 12, pp. 499- 510 ,(2011) , 10.1038/NRG3012
R. Ogden, K. Gharbi, N. Mugue, J. Martinsohn, H. Senn, J. W. Davey, M. Pourkazemi, R. McEwing, C. Eland, M. Vidotto, A. Sergeev, L. Congiu, Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing Molecular Ecology. ,vol. 22, pp. 3112- 3123 ,(2013) , 10.1111/MEC.12234
Catherine E. Wagner, Irene Keller, Samuel Wittwer, Oliver M. Selz, Salome Mwaiko, Lucie Greuter, Arjun Sivasundar, Ole Seehausen, Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation Molecular Ecology. ,vol. 22, pp. 787- 798 ,(2013) , 10.1111/MEC.12023
Philip W. Hedrick, A STANDARDIZED GENETIC DIFFERENTIATION MEASURE Evolution. ,vol. 59, pp. 1633- 1638 ,(2005) , 10.1111/J.0014-3820.2005.TB01814.X
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, , The Sequence Alignment/Map format and SAMtools Bioinformatics. ,vol. 25, pp. 2078- 2079 ,(2009) , 10.1093/BIOINFORMATICS/BTP352