SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

作者: Juan Falgueras , Antonio J Lara , Noe Fernandez-Pozo , Francisco R. Canton , Guillermo Perez-Trabado

DOI: 10.1186/1471-2105-11-38

关键词: Web serviceWorkflowGeneticsData miningSoftwareThroughput (business)Line (text file)Pipeline (software)Sequence (medicine)Interface (computing)Biology

摘要: High-throughput automated sequencing has enabled an exponential growth rate of data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival pyrosequencing enhances this problem necessitates customisable pre-processing algorithms. SeqTrim been implemented both as a Web standalone command line application. Already-published newly-designed algorithms have included identify inserts, remove low quality, vector, adaptor, complexity contaminant sequences, detect chimeric reads. availability several input output formats allows its inclusion processing workflows. Due specific algorithms, outperforms other pre-processors services or applications. It performs equally well sequences from EST libraries, SSH genomic DNA libraries reads does not lead over-trimming. is efficient pipeline designed for any type read, including next-generation sequencing. easily configurable provides friendly interface that users know what happened at every stage, verify individual if desired. recommended reveals more information about each than previously described can discard experimental artefacts.

参考文章(18)
Jeffrey Scott Coker, Eric Davies, Identifying adaptor contamination when mining DNA sequence data. BioTechniques. ,vol. 37, pp. 194- 198 ,(2004) , 10.2144/04372BM03
Feng Liang, Ingeborg Holt, Geo Pertea, Svetlana Karamycheva, Steven L Salzberg, John Quackenbush, An optimized protocol for analysis of EST sequences Nucleic Acids Research. ,vol. 28, pp. 3657- 3665 ,(2000) , 10.1093/NAR/28.18.3657
H.-H. Chou, M. H. Holmes, DNA sequence quality trimming and vector removal Bioinformatics. ,vol. 17, pp. 1093- 1104 ,(2001) , 10.1093/BIOINFORMATICS/17.12.1093
G. Seluja, A Farmer, M McLeod, C Harger, P. Schad, Establishing a method of vector contamination identification in database sequences. Bioinformatics. ,vol. 15, pp. 106- 110 ,(1999) , 10.1093/BIOINFORMATICS/15.2.106
Yi-An Chen, Chang-Chun Lin, Chin-Di Wang, Huan-Bin Wu, Pei-Ing Hwang, An optimized procedure greatly improves EST vector contamination removal BMC Genomics. ,vol. 8, pp. 416- 416 ,(2007) , 10.1186/1471-2164-8-416
Javier Forment, Francisco Gilabert, Antonio Robles, Vicente Conejero, Fernando Nuez, Jose M Blanca, EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration BMC Bioinformatics. ,vol. 9, pp. 5- 5 ,(2008) , 10.1186/1471-2105-9-5
James K. Bonfield, Kathryn F. Smith, Rodger Staden, A new DNA sequence assembly program Nucleic Acids Research. ,vol. 23, pp. 4992- 4999 ,(1995) , 10.1093/NAR/23.24.4992
Agnes Hotz-Wagenblatt, Thomas Hankeln, Peter Ernst, Karl-Heinz Glatting, Erwin R Schmidt, Sándor Suhai, ESTAnnotator: a tool for high throughput EST annotation Nucleic Acids Research. ,vol. 31, pp. 3716- 3719 ,(2003) , 10.1093/NAR/GKG566
K. P. Micallef, M. Cooper, D. W. Podlich, Using clusters of computers for large QU-GENE simulation experiments. Bioinformatics. ,vol. 17, pp. 194- 195 ,(2001) , 10.1093/BIOINFORMATICS/17.2.194