作者: Xiaoli Jiao , Hiromi Imamichi , Tauseef Rehman , Rishub Nahar , Robin L. Dewar
关键词: Ebola virus 、 Viral quasispecies 、 Amplicon 、 Antiviral drug 、 Genetics 、 RNA virus 、 Biology 、 Population 、 Viral pathogenesis 、 Virus
摘要: Many of the infectious diseases which have jeopardized and still are a threat to public health caused by RNA viruses, including HIV, HCV, Influenza virus, Ebola virus Zika virus. Because high rate mutations recombination events, rapidly evolving viruses prevail within host as collection closely related variants, referred viral quasi-species. Uncovering genetic diversity (i.e., inferring haplotypes their proportions in population) an can significantly benefit study disease progression, antiviral drug design, vaccine design pathogenesis. Recent advances PacBio single-molecule sequencing offers sufficient throughput contiguous long reads (>10kb) covering full length most genes genomes, providing potential reliably profile populations. However, relatively error (2~15%) long-read data requires novel analysis methods deconvolute sequences derived from complex mixtures. We examined samples containing mixtures near-full-length HIV-1 single molecules sequenced (9kb) amplicons directly PCR products, developed signature-based self-tuning spectral clustering method called SigClust accurately determine identity (above 99.5%) relative abundances genomes Results on real influenza benchmark sets demonstrate efficacy superior performance SigClust.