From partial to whole genome imputation of SARS-CoV-2 for epidemiological surveillance

作者: Dopazo J , Garcia F , Ortuno Fm , Perez-Florido J , Casimiro-Soriguer Cs

DOI: 10.1101/2021.04.13.439668

关键词:

摘要: Abstract Background the current SARS-CoV-2 pandemic has emphasized utility of viral whole genome sequencing in surveillance and control pathogen. An unprecedented ongoing global initiative is increasingly producing hundreds thousands sequences worldwide. However, complex circumstances which viruses are sequenced, along with demand urgent results, causes a high rate incomplete therefore useless, sequences. evolve context phylogeny different positions linkage disequilibrium. Therefore, an imputation method would be able to predict missing from available data. Results We developed impuSARS, application that includes Minimac, most widely used strategy for genomic data and, taking advantage enormous amount available, reference panel containing 239,301 was built. The impuSARS tested wide range conditions (continuous fragments, amplicons or sparse individual missing) showing great fidelity when reconstructing original also impute genomes commercial kits covering less than 20% only Spike protein precision 0.96. It recovers lineage 100% almost all lineages, even very poorly covered ( Conclusions can improve pace production by recovering many low-quality otherwise discarded. incorporated any primary processing pipeline sequencing.

参考文章(31)
Paolo Di Tommaso, Emilio Palumbo, Maria Chatzou, Pablo Prieto, Michael L. Heuer, Cedric Notredame, The impact of Docker containers on the performance of genomic pipelines PeerJ. ,vol. 3, ,(2015) , 10.7717/PEERJ.1273
Giuseppe Jurman, Samantha Riccadonna, Cesare Furlanello, A Comparison of MCC and CEN Error Measures in Multi-Class Prediction PLoS ONE. ,vol. 7, pp. e41882- ,(2012) , 10.1371/JOURNAL.PONE.0041882
Benjamin Smith, Zigui Chen, Laura Reimers, Koenraad van Doorslaer, Mark Schiffman, Rob DeSalle, Rolando Herrero, Kai Yu, Sholom Wacholder, Tao Wang, Robert D. Burk, Sequence Imputation of HPV16 Genomes for Genetic Association Studies PLoS ONE. ,vol. 6, pp. e21375- ,(2011) , 10.1371/JOURNAL.PONE.0021375
Peter Bogner, , Ilaria Capua, David J. Lipman, Nancy J. Cox, A global initiative on sharing avian flu data Nature. ,vol. 442, pp. 981- 981 ,(2006) , 10.1038/442981A
Brian L. Browning, Sharon R. Browning, A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals American Journal of Human Genetics. ,vol. 84, pp. 210- 223 ,(2009) , 10.1016/J.AJHG.2009.01.005
Jonathan Marchini, Bryan Howie, Genotype imputation for genome-wide association studies Nature Reviews Genetics. ,vol. 11, pp. 499- 511 ,(2010) , 10.1038/NRG2796
Bryan Howie, Christian Fuchsberger, Matthew Stephens, Jonathan Marchini, Gonçalo R Abecasis, Fast and accurate genotype imputation in genome-wide association studies through pre-phasing Nature Genetics. ,vol. 44, pp. 955- 959 ,(2012) , 10.1038/NG.2354
R. C. Edgar, MUSCLE: multiple sequence alignment with high accuracy and high throughput Nucleic Acids Research. ,vol. 32, pp. 1792- 1797 ,(2004) , 10.1093/NAR/GKH340
Jonathan Marchini, Bryan Howie, Simon Myers, Gil McVean, Peter Donnelly, A new multipoint method for genome-wide association studies by imputation of genotypes Nature Genetics. ,vol. 39, pp. 906- 913 ,(2007) , 10.1038/NG2088
Christian Fuchsberger, Gonçalo R. Abecasis, David A. Hinds, minimac2: faster genotype imputation Bioinformatics. ,vol. 31, pp. 782- 784 ,(2015) , 10.1093/BIOINFORMATICS/BTU704