A total crapshoot? Evaluating bioinformatic decisions in animal diet metabarcoding analyses.

作者: Devon R. O'Rourke , Nicholas A. Bokulich , Michelle A. Jusino , Matthew D. MacManes , Jeffrey T. Foster

DOI: 10.1002/ECE3.6594

关键词:

摘要: Metabarcoding studies provide a powerful approach to estimate the diversity and abundance of organisms in mixed communities nature. While strategies exist for optimizing sample sequence library preparation, best practices bioinformatic processing amplicon data are lacking animal diet studies. Here we evaluate how decisions made core processes, including filtering, database design, classification, can influence metabarcoding results. We show that denoising methods have lower error rates compared traditional clustering methods, although these differences largely mitigated by removing low-abundance variants. also found available reference datasets from GenBank BOLD marker gene cytochrome oxidase I (COI) be complementary, discuss improve existing databases include versioned releases. Taxonomic classification dramatically affect For example, commonly used Barcode Life Database (BOLD) Classification API assigned fewer names samples order through species levels using both mock community bat guano all other classifiers (vsearch-SINTAX q2-feature-classifier's BLAST + LCA, VSEARCH Naive Bayes classifiers). The lack consensus on bioinformatics limits comparisons among may introduce biases. Our work suggests biological offer useful standard myriad computational impacting accuracy. Further, highlight need continual evaluations as new tools adopted ensure inferences drawn reflect meaningful biology instead digital artifacts.

参考文章(53)
T. Z. DeSantis, P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. Huber, D. Dalevi, P. Hu, G. L. Andersen, Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB Applied and Environmental Microbiology. ,vol. 72, pp. 5069- 5072 ,(2006) , 10.1128/AEM.03006-05
Marcel Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet.journal. ,vol. 17, pp. 10- 12 ,(2011) , 10.14806/EJ.17.1.200
J. G. Caporaso, C. L. Lauber, W. A. Walters, D. Berg-Lyons, C. A. Lozupone, P. J. Turnbaugh, N. Fierer, R. Knight, Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample Proceedings of the National Academy of Sciences of the United States of America. ,vol. 108, pp. 4516- 4522 ,(2011) , 10.1073/PNAS.1000080107
Nicholas A Bokulich, Sathish Subramanian, Jeremiah J Faith, Dirk Gevers, Jeffrey I Gordon, Rob Knight, David A Mills, J Gregory Caporaso, Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing Nature Methods. ,vol. 10, pp. 57- 59 ,(2013) , 10.1038/NMETH.2276
Ida Baerholm Schnell, Kristine Bohmann, M. Thomas P. Gilbert, Tag jumps illuminated--reducing sequence-to-sample misidentifications in metabarcoding studies. Molecular Ecology Resources. ,vol. 15, pp. 1289- 1303 ,(2015) , 10.1111/1755-0998.12402
Elmar Pruesse, Christian Quast, Katrin Knittel, Bernhard M Fuchs, Wolfgang Ludwig, Jörg Peplies, Frank Oliver Glöckner, None, SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB Nucleic Acids Research. ,vol. 35, pp. 7188- 7196 ,(2007) , 10.1093/NAR/GKM864
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay, Scikit-learn: Machine Learning in Python Journal of Machine Learning Research. ,vol. 12, pp. 2825- 2830 ,(2011)
Alice Valentini, François Pompanon, Pierre Taberlet, DNA barcoding for ecologists Trends in Ecology and Evolution. ,vol. 24, pp. 110- 117 ,(2009) , 10.1016/J.TREE.2008.09.011