Accurate Reconstruction of Microbial Strains from Metagenomic Sequencing Using Representative Reference Genomes

作者: Zhemin Zhou , Nina Luhmann , Nabil-Fareed Alikhan , Christopher Quince , Mark Achtman

DOI: 10.1007/978-3-319-89929-9_15

关键词: Statistical modelTaxonomic rankFalse positive paradoxGenomeComputational biologyIdentification (information)Computer scienceGenetic diversityData structureMetagenomics

摘要: Exploring the genetic diversity of microbes within environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating number species in metagenome. Both are especially problematic identification low-abundance microbial species, e. g. detecting pathogens ancient samples. We present a new method, SPARSE, which improves reads. SPARSE balances databases by grouping genomes similarity-based hierarchical clusters, implemented as an efficient incremental structure. assigns to clusters using probabilistic model, specifically penalizes non-specific mappings unknown sources hence reduces false-positive assignments. Our on simulated datasets two demonstrated improved precision comparison other classification. In third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains same sample. real archaeological datasets, identified \({\le }0.02\%\) abundance, consistent published findings required additional data. missed targeted reported non-existent ones.

参考文章(25)
Duy Tin Truong, Eric A Franzosa, Timothy L Tickle, Matthias Scholz, George Weingart, Edoardo Pasolli, Adrian Tett, Curtis Huttenhower, Nicola Segata, MetaPhlAn2 for enhanced metagenomic taxonomic profiling Nature Methods. ,vol. 12, pp. 902- 903 ,(2015) , 10.1038/NMETH.3589
Haitham Marakeby, Eman Badr, Hanaa Torkey, Yuhyun Song, Scotland Leman, Caroline L. Monteil, Lenwood S. Heath, Boris A. Vinatzer, A System to Automatically Classify and Name Any Individual Genome-Sequenced Organism Independently of Current Biological Classification and Nomenclature PLoS ONE. ,vol. 9, pp. e89142- ,(2014) , 10.1371/JOURNAL.PONE.0089142
Christian Quast, Elmar Pruesse, Pelin Yilmaz, Jan Gerken, Timmy Schweer, Pablo Yarza, Jörg Peplies, Frank Oliver Glöckner, The SILVA ribosomal RNA gene database project: improved data processing and web-based tools Nucleic Acids Research. ,vol. 41, pp. 590- 596 ,(2012) , 10.1093/NAR/GKS1219
S Altschula, Warren Gisha, Webb Millerb, E Meyersc, D Lipmana, None, Basic Local Alignment Search Tool Journal of Molecular Biology. ,vol. 215, pp. 403- 410 ,(1990) , 10.1016/S0022-2836(05)80360-2
Gemma L. Kay, Martin J. Sergeant, Zhemin Zhou, Jacqueline Z.-M. Chan, Andrew Millard, Joshua Quick, Ildikó Szikossy, Ildikó Pap, Mark Spigelman, Nicholas J. Loman, Mark Achtman, Helen D. Donoghue, Mark J. Pallen, Eighteenth-century genomes show that mixed infections were common at time of peak tuberculosis in Europe. Nature Communications. ,vol. 6, pp. 6717- 6717 ,(2015) , 10.1038/NCOMMS7717
Simon Rasmussen, Morten Erik Allentoft, Kasper Nielsen, Ludovic Orlando, Martin Sikora, Karl-Göran Sjögren, Anders Gorm Pedersen, Mikkel Schubert, Alex Van Dam, Christian Moliin Outzen Kapel, Henrik Bjørn Nielsen, Søren Brunak, Pavel Avetisyan, Andrey Epimakhov, Mikhail Viktorovich Khalyapin, Artak Gnuni, Aivar Kriiska, Irena Lasak, Mait Metspalu, Vyacheslav Moiseyev, Andrei Gromov, Dalia Pokutta, Lehti Saag, Liivi Varul, Levon Yepiskoposyan, Thomas Sicheritz-Pontén, Robert A. Foley, Marta Mirazón Lahr, Rasmus Nielsen, Kristian Kristiansen, Eske Willerslev, Early divergent strains of Yersinia pestis in Eurasia 5,000 years ago. Cell. ,vol. 163, pp. 571- 582 ,(2015) , 10.1016/J.CELL.2015.10.009
K. T. Konstantinidis, J. M. Tiedje, Genomic insights that advance the species definition for prokaryotes Proceedings of the National Academy of Sciences of the United States of America. ,vol. 102, pp. 2567- 2572 ,(2005) , 10.1073/PNAS.0409727102
Tae-Hyuk Ahn, Juanjuan Chai, Chongle Pan, Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance Bioinformatics. ,vol. 31, pp. 170- 177 ,(2015) , 10.1093/BIOINFORMATICS/BTU641
Derrick E Wood, Steven L Salzberg, Kraken: ultrafast metagenomic sequence classification using exact alignments Genome Biology. ,vol. 15, pp. 1- 12 ,(2014) , 10.1186/GB-2014-15-3-R46