作者: Devon R. O'Rourke , Nicholas A. Bokulich , Michelle A. Jusino , Matthew D. MacManes , Jeffrey T. Foster
DOI: 10.1002/ECE3.6594
关键词:
摘要: Metabarcoding studies provide a powerful approach to estimate the diversity and abundance of organisms in mixed communities nature. While strategies exist for optimizing sample sequence library preparation, best practices bioinformatic processing amplicon data are lacking animal diet studies. Here we evaluate how decisions made core processes, including filtering, database design, classification, can influence metabarcoding results. We show that denoising methods have lower error rates compared traditional clustering methods, although these differences largely mitigated by removing low-abundance variants. also found available reference datasets from GenBank BOLD marker gene cytochrome oxidase I (COI) be complementary, discuss improve existing databases include versioned releases. Taxonomic classification dramatically affect For example, commonly used Barcode Life Database (BOLD) Classification API assigned fewer names samples order through species levels using both mock community bat guano all other classifiers (vsearch-SINTAX q2-feature-classifier's BLAST + LCA, VSEARCH Naive Bayes classifiers). The lack consensus on bioinformatics limits comparisons among may introduce biases. Our work suggests biological offer useful standard myriad computational impacting accuracy. Further, highlight need continual evaluations as new tools adopted ensure inferences drawn reflect meaningful biology instead digital artifacts.