作者: Timo Lassmann
DOI: 10.1186/S12859-015-0454-Y
关键词: 2 base encoding 、 Biology 、 Hybrid genome assembly 、 Hidden Markov model 、 Sequence assembly 、 DNA sequencing theory 、 Data mining 、 ABI Solid Sequencing 、 Shotgun sequencing 、 Genetics 、 Massive parallel sequencing
摘要: Arguably the most basic step in analysis of next generation sequencing data (NGS) involves extraction mappable reads from raw produced by instruments. The presence barcodes, adaptors and artifacts subject to errors makes this non-trivial. Here I present TagDust2, a generic approach utilizing library hidden Markov models (HMM) accurately extract wide array possible read architectures. TagDust2 extracts more higher quality compared other approaches. Processing multiplexed single, paired end libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included exclude known contaminants filter out low complexity sequences. Finally, can automatically detect type sequenced predefined selection. Taken together feature rich, flexible adaptive solution go NGS single step. ability recognize record contents will help automate demystify initial, often poorly documented, pipelines. freely available at: http://tagdust.sourceforge.net .