Error, noise and bias in de novo transcriptome assemblies

作者: Adam H. Freedman , Michele Clamp , Timothy B. Sackton

DOI: 10.1101/585745

关键词:

摘要: De novo transcriptome assembly is a powerful tool, widely used over the last decade for making evolutionary inferences. However, it relies on two implicit assumptions: that assembled an unbiased representation of underlying expressed transcriptome, and expression estimates from are good, if noisy approximations relative abundance transcripts. Using publicly available data model organisms, we demonstrate that, across algorithms, species, sets, these assumptions consistently violated. Bias exists at nucleotide level, with genotyping error rates ranging 30-83%. As result, diversity underestimated in assemblies, consistent under-estimation heterozygosity all but most inbred samples. Even gene show wide deviations map-to-reference estimates, positive bias lower levels. Standard filtering assemblies improves robustness leads to loss meaningful number protein coding genes, including many highly expressed. We computational method partly alleviate noise estimates. Researchers should consider ways minimize impact assemblies.

参考文章(55)
Shawn J. Cokus, Paul F. Gugger, Victoria L. Sork, Evolutionary insights from de novo transcriptome assembly and SNP discovery in California white oaks. BMC Genomics. ,vol. 16, pp. 552- 552 ,(2015) , 10.1186/S12864-015-1761-4
M. S. Pankey, V. N. Minin, G. C. Imholte, M. A. Suchard, T. H. Oakley, Predictable transcriptome evolution in the convergent and complex bioluminescent organs of squid. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 111, pp. 201416574- ,(2014) , 10.1073/PNAS.1416574111
Huaiyu Mi, Anushya Muruganujan, John T Casagrande, Paul D Thomas, Large-scale gene function analysis with the PANTHER classification system Nature Protocols. ,vol. 8, pp. 1551- 1566 ,(2013) , 10.1038/NPROT.2013.092
J. Romiguier, P. Gayral, M. Ballenghien, A. Bernard, V. Cahais, A. Chenuil, Y. Chiari, R. Dernat, L. Duret, N. Faivre, E. Loire, J. M. Lourenco, B. Nabholz, C. Roux, G. Tsagkogeorga, A. A.-T. Weber, L. A. Weinert, K. Belkhir, N. Bierne, S. Glémin, N. Galtier, Comparative population genomics in animals uncovers the determinants of genetic diversity Nature. ,vol. 515, pp. 261- 263 ,(2014) , 10.1038/NATURE13685
Daehwan Kim, Ben Langmead, Steven L Salzberg, HISAT: a fast spliced aligner with low memory requirements Nature Methods. ,vol. 12, pp. 357- 360 ,(2015) , 10.1038/NMETH.3317
Aaron R. Quinlan, Ira M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features Bioinformatics. ,vol. 26, pp. 841- 842 ,(2010) , 10.1093/BIOINFORMATICS/BTQ033