作者: Michael S. Robeson , David W. Ussery , Trudy M. Wassenaar , Zulema Udaondo , Visanu Wanchai
DOI: 10.1038/S42003-020-01626-5
关键词:
摘要: In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined classified. This is, to our knowledge, the largest E. genome dataset analyzed date. A Mash-based analysis of a cleaned set 10,667 from GenBank revealed 14 distinct phylogroups. representative or medoid identified for each phylogroup was used as proxy classify 95,525 unassembled Sequence Read Archive (SRA). We find that most sequenced belong four phylogroups (A, C, B1 E2(O157)). Authenticity is supported by several different lines evidence: phylogroup-specific core genes, phylogenetic tree constructed with 2613 single copy differences in rates gene gain/loss/duplication. The methodology work able reproduce known phylogroups, well identify previously uncharacterized species.