TagDust2: A generic method to extract reads from sequencing data

关键词: 2 base encoding 、 Biology 、 Hybrid genome assembly 、 Hidden Markov model 、 Sequence assembly 、 DNA sequencing theory 、 Data mining 、 ABI Solid Sequencing 、 Shotgun sequencing 、 Genetics 、 Massive parallel sequencing

摘要: Arguably the most basic step in analysis of next generation sequencing data (NGS) involves extraction mappable reads from raw produced by instruments. The presence barcodes, adaptors and artifacts subject to errors makes this non-trivial. Here I present TagDust2, a generic approach utilizing library hidden Markov models (HMM) accurately extract wide array possible read architectures. TagDust2 extracts more higher quality compared other approaches. Processing multiplexed single, paired end libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included exclude known contaminants filter out low complexity sequences. Finally, can automatically detect type sequenced predefined selection. Taken together feature rich, flexible adaptive solution go NGS single step. ability recognize record contents will help automate demystify initial, often poorly documented, pipelines. freely available at: http://tagdust.sourceforge.net .

doi.org PDF 下载加速

springer.com LINK 下载加速

doi.org PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(22)

Teemu Kivioja, Anna Vähärautio, Kasper Karlsson, Martin Bonke, Sten Linnarsson, Jussi Taipale, Counting absolute number of molecules using unique molecular identifiers Nature Precedings. pp. 1- 1 ,(2011) , 10.1038/NPRE.2011.5903.1

Jeffrey Parvin, Terry Camerlengo, Pearlly Yan, Kun Huang, Raghuram Onti-Srinivasan, Tim Huang, Hatice Gulcin Ozer, From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science. ,vol. 2012, pp. 1- 10 ,(2012)

Akira Hasegawa, Carsten Daub, Piero Carninci, Yoshihide Hayashizaki, Timo Lassmann, MOIRAI: a compact workflow system for CAGE analysis BMC Bioinformatics. ,vol. 15, pp. 144- 144 ,(2014) , 10.1186/1471-2105-15-144

David W Craig, John V Pearson, Szabolcs Szelinger, Aswin Sekar, Margot Redman, Jason J Corneveaux, Traci L Pawlowski, Trisha Laub, Gary Nunn, Dietrich A Stephan, Nils Homer, Matthew J Huentelman, None, Identification of genetic variants using bar-coded multiplexed sequencing Nature Methods. ,vol. 5, pp. 887- 893 ,(2008) , 10.1038/NMETH.1251

Brant C. Faircloth, Travis C. Glenn, Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels. PLOS ONE. ,vol. 7, ,(2012) , 10.1371/JOURNAL.PONE.0042543

Sean R Eddy, None, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids ,(1998)

Marcel Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet.journal. ,vol. 17, pp. 10- 12 ,(2011) , 10.14806/EJ.17.1.200

Teemu Kivioja, Anna Vähärautio, Kasper Karlsson, Martin Bonke, Martin Enge, Sten Linnarsson, Jussi Taipale, Counting absolute numbers of molecules using unique molecular identifiers Nature Methods. ,vol. 9, pp. 72- 74 ,(2012) , 10.1038/NMETH.1778

Yong Kong, Btrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies Genomics. ,vol. 98, pp. 152- 153 ,(2011) , 10.1016/J.YGENO.2011.05.009

10.

Heng Li, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM arXiv: Genomics. ,(2013) , 10.6084/M9.FIGSHARE.963153.V1

TagDust2: A generic method to extract reads from sequencing data

来源期刊

我的账户

TagDust2: A generic method to extract reads from sequencing data

来源期刊

相似文章 10

我的账户