TagDust2: A generic method to extract reads from sequencing data

作者: Timo Lassmann

DOI: 10.1186/S12859-015-0454-Y

关键词: 2 base encodingBiologyHybrid genome assemblyHidden Markov modelSequence assemblyDNA sequencing theoryData miningABI Solid SequencingShotgun sequencingGeneticsMassive parallel sequencing

摘要: Arguably the most basic step in analysis of next generation sequencing data (NGS) involves extraction mappable reads from raw produced by instruments. The presence barcodes, adaptors and artifacts subject to errors makes this non-trivial. Here I present TagDust2, a generic approach utilizing library hidden Markov models (HMM) accurately extract wide array possible read architectures. TagDust2 extracts more higher quality compared other approaches. Processing multiplexed single, paired end libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included exclude known contaminants filter out low complexity sequences. Finally, can automatically detect type sequenced predefined selection. Taken together feature rich, flexible adaptive solution go NGS single step. ability recognize record contents will help automate demystify initial, often poorly documented, pipelines. freely available at: http://tagdust.sourceforge.net .

参考文章(22)
Teemu Kivioja, Anna Vähärautio, Kasper Karlsson, Martin Bonke, Sten Linnarsson, Jussi Taipale, Counting absolute number of molecules using unique molecular identifiers Nature Precedings. pp. 1- 1 ,(2011) , 10.1038/NPRE.2011.5903.1
Jeffrey Parvin, Terry Camerlengo, Pearlly Yan, Kun Huang, Raghuram Onti-Srinivasan, Tim Huang, Hatice Gulcin Ozer, From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science. ,vol. 2012, pp. 1- 10 ,(2012)
Akira Hasegawa, Carsten Daub, Piero Carninci, Yoshihide Hayashizaki, Timo Lassmann, MOIRAI: a compact workflow system for CAGE analysis BMC Bioinformatics. ,vol. 15, pp. 144- 144 ,(2014) , 10.1186/1471-2105-15-144
David W Craig, John V Pearson, Szabolcs Szelinger, Aswin Sekar, Margot Redman, Jason J Corneveaux, Traci L Pawlowski, Trisha Laub, Gary Nunn, Dietrich A Stephan, Nils Homer, Matthew J Huentelman, None, Identification of genetic variants using bar-coded multiplexed sequencing Nature Methods. ,vol. 5, pp. 887- 893 ,(2008) , 10.1038/NMETH.1251
Marcel Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet.journal. ,vol. 17, pp. 10- 12 ,(2011) , 10.14806/EJ.17.1.200
Teemu Kivioja, Anna Vähärautio, Kasper Karlsson, Martin Bonke, Martin Enge, Sten Linnarsson, Jussi Taipale, Counting absolute numbers of molecules using unique molecular identifiers Nature Methods. ,vol. 9, pp. 72- 74 ,(2012) , 10.1038/NMETH.1778