作者: Adam H. Freedman , Michele Clamp , Timothy B. Sackton
DOI: 10.1101/585745
关键词:
摘要: De novo transcriptome assembly is a powerful tool, widely used over the last decade for making evolutionary inferences. However, it relies on two implicit assumptions: that assembled an unbiased representation of underlying expressed transcriptome, and expression estimates from are good, if noisy approximations relative abundance transcripts. Using publicly available data model organisms, we demonstrate that, across algorithms, species, sets, these assumptions consistently violated. Bias exists at nucleotide level, with genotyping error rates ranging 30-83%. As result, diversity underestimated in assemblies, consistent under-estimation heterozygosity all but most inbred samples. Even gene show wide deviations map-to-reference estimates, positive bias lower levels. Standard filtering assemblies improves robustness leads to loss meaningful number protein coding genes, including many highly expressed. We computational method partly alleviate noise estimates. Researchers should consider ways minimize impact assemblies.