On Optimal Read Trimming in Next Generation Sequencing and Its Complexity

作者： Ivo Hedtke , Ioana Lemnian , Matthias Müller-Hannemann , Ivo Grosse

关键词: Heuristics 、 Block (data storage) 、 Trimming 、 Constrained optimization problem 、 Algorithm 、 Almost surely 、 Computer science 、 Quality (business)

摘要: Read trimming is a fundamental first step of the analysis next generation sequencing (NGS) data. Traditionally, read performed heuristically, and algorithmic work in this area has been neglected. Here, we address topic formulate three constrained optimization problems for block-based trimming, i.e., truncating same low-quality positions at both ends all reads removing truncated reads. We find that are \(\mathcal{NP}\)-hard. However, non-random distribution quality scores NGS data sets makes it tempting to speculate constraints typically satisfied by fulfilling Based on speculation, propose relaxed develop efficient polynomial-time algorithms them. (i) omitted indeed almost always (ii) yield higher number untrimmed bases than traditional heuristics.

参考文章(10)

Ravi K. Patel, Mukesh Jain, NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLOS ONE. ,vol. 7, ,(2012) , 10.1371/JOURNAL.PONE.0030619

Teofilo F. Gonzalez, Clustering to minimize the maximum intercluster distance Theoretical Computer Science. ,vol. 38, pp. 293- 306 ,(1985) , 10.1016/0304-3975(85)90224-5

Cristian Del Fabbro, Simone Scalabrin, Michele Morgante, Federico M. Giorgi, An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis PLoS ONE. ,vol. 8, pp. e85024- 13 ,(2013) , 10.1371/JOURNAL.PONE.0085024

Daniel C. Koboldt, Karyn Meltz Steinberg, David E. Larson, Richard K. Wilson, Elaine R. Mardis, The Next-Generation Sequencing Revolution and Its Impact on Genomics Cell. ,vol. 155, pp. 27- 38 ,(2013) , 10.1016/J.CELL.2013.09.006

Ron Edgar, Michael Domrachev, Alex E Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Research. ,vol. 30, pp. 207- 210 ,(2002) , 10.1093/NAR/30.1.207

Anaïs F Bardet, Qiye He, Julia Zeitlinger, Alexander Stark, A computational pipeline for comparative ChIP-seq analyses Nature Protocols. ,vol. 7, pp. 45- 61 ,(2012) , 10.1038/NPROT.2011.420

Vipul Bhargava, Steven R. Head, Phillip Ordoukhanian, Mark Mercola, Shankar Subramaniam, Technical Variations in Low-Input RNA-seq Methodologies Scientific Reports. ,vol. 4, pp. 3678- 3678 ,(2015) , 10.1038/SREP03678

Brent Ewing, Phil Green, Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities Genome Research. ,vol. 8, pp. 186- 194 ,(1998) , 10.1101/GR.8.3.186

Brent Ewing, LaDeana Hillier, Michael C. Wendl, Phil Green, Base-calling of automated sequencer traces using Phred. I. accuracy assessment Genome Research. ,vol. 8, pp. 175- 185 ,(1998) , 10.1101/GR.8.3.175

10.

R. Schmieder, R. Edwards, Quality control and preprocessing of metagenomic datasets Bioinformatics. ,vol. 27, pp. 863- 864 ,(2011) , 10.1093/BIOINFORMATICS/BTR026

On Optimal Read Trimming in Next Generation Sequencing and Its Complexity

来源期刊

我的账户

On Optimal Read Trimming in Next Generation Sequencing and Its Complexity

来源期刊

相似文章 0

我的账户