作者: Andrea Benazzo , Alex Panziera , Giorgio Bertorelle
DOI: 10.1002/ECE3.1261
关键词: Parallel processing (DSP implementation) 、 Pipeline (computing) 、 Data mining 、 Server 、 Computer science 、 Source code 、 Statistics 、 Unix 、 File format 、 Multi-core processor 、 Set (abstract data type)
摘要: Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, parallel computation simple statistics within between populations from large panels polymorphic sites is not yet available, making exploratory analyses a set or subset very laborious task. Here, we present 4P (parallel processing polymorphism panels), stand-alone software program rapid genetic variation (including joint frequency spectrum) millions variants in multiple individuals populations. It handles standard input file format commonly used to store empirical simulation experiments. The computational performance was evaluated using SNP (single nucleotide polymorphism) datasets human genomes obtained by simulations. faster much than other comparable programs, impact computing multicore computers servers evident. useful tool biologists who need computer run genomic data. also particularly suitable analyze sets produced Unix, Windows, MacOs versions are provided, as well source code easier pipeline implementations.