作者: Dopazo J , Garcia F , Ortuno Fm , Perez-Florido J , Casimiro-Soriguer Cs
DOI: 10.1101/2021.04.13.439668
关键词:
摘要: Abstract Background the current SARS-CoV-2 pandemic has emphasized utility of viral whole genome sequencing in surveillance and control pathogen. An unprecedented ongoing global initiative is increasingly producing hundreds thousands sequences worldwide. However, complex circumstances which viruses are sequenced, along with demand urgent results, causes a high rate incomplete therefore useless, sequences. evolve context phylogeny different positions linkage disequilibrium. Therefore, an imputation method would be able to predict missing from available data. Results We developed impuSARS, application that includes Minimac, most widely used strategy for genomic data and, taking advantage enormous amount available, reference panel containing 239,301 was built. The impuSARS tested wide range conditions (continuous fragments, amplicons or sparse individual missing) showing great fidelity when reconstructing original also impute genomes commercial kits covering less than 20% only Spike protein precision 0.96. It recovers lineage 100% almost all lineages, even very poorly covered ( Conclusions can improve pace production by recovering many low-quality otherwise discarded. incorporated any primary processing pipeline sequencing.