SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

作者： Juan Falgueras , Antonio J Lara , Noe Fernandez-Pozo , Francisco R. Canton , Guillermo Perez-Trabado

关键词: Web service 、 Workflow 、 Genetics 、 Data mining 、 Software 、 Throughput (business) 、 Line (text file) 、 Pipeline (software) 、 Sequence (medicine) 、 Interface (computing) 、 Biology

摘要: High-throughput automated sequencing has enabled an exponential growth rate of data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival pyrosequencing enhances this problem necessitates customisable pre-processing algorithms. SeqTrim been implemented both as a Web standalone command line application. Already-published newly-designed algorithms have included identify inserts, remove low quality, vector, adaptor, complexity contaminant sequences, detect chimeric reads. availability several input output formats allows its inclusion processing workflows. Due specific algorithms, outperforms other pre-processors services or applications. It performs equally well sequences from EST libraries, SSH genomic DNA libraries reads does not lead over-trimming. is efficient pipeline designed for any type read, including next-generation sequencing. easily configurable provides friendly interface that users know what happened at every stage, verify individual if desired. recommended reveals more information about each than previously described can discard experimental artefacts.

参考文章(18)

Jeffrey Scott Coker, Eric Davies, Identifying adaptor contamination when mining DNA sequence data. BioTechniques. ,vol. 37, pp. 194- 198 ,(2004) , 10.2144/04372BM03

Feng Liang, Ingeborg Holt, Geo Pertea, Svetlana Karamycheva, Steven L Salzberg, John Quackenbush, An optimized protocol for analysis of EST sequences Nucleic Acids Research. ,vol. 28, pp. 3657- 3665 ,(2000) , 10.1093/NAR/28.18.3657

J Jurka, Repbase Update: a database and an electronic journal of repetitive elements Trends in Genetics. ,vol. 16, pp. 418- 420 ,(2000) , 10.1016/S0168-9525(00)02093-X

H.-H. Chou, M. H. Holmes, DNA sequence quality trimming and vector removal Bioinformatics. ,vol. 17, pp. 1093- 1104 ,(2001) , 10.1093/BIOINFORMATICS/17.12.1093

G. Seluja, A Farmer, M McLeod, C Harger, P. Schad, Establishing a method of vector contamination identification in database sequences. Bioinformatics. ,vol. 15, pp. 106- 110 ,(1999) , 10.1093/BIOINFORMATICS/15.2.106

Yi-An Chen, Chang-Chun Lin, Chin-Di Wang, Huan-Bin Wu, Pei-Ing Hwang, An optimized procedure greatly improves EST vector contamination removal BMC Genomics. ,vol. 8, pp. 416- 416 ,(2007) , 10.1186/1471-2164-8-416

Javier Forment, Francisco Gilabert, Antonio Robles, Vicente Conejero, Fernando Nuez, Jose M Blanca, EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration BMC Bioinformatics. ,vol. 9, pp. 5- 5 ,(2008) , 10.1186/1471-2105-9-5

James K. Bonfield, Kathryn F. Smith, Rodger Staden, A new DNA sequence assembly program Nucleic Acids Research. ,vol. 23, pp. 4992- 4999 ,(1995) , 10.1093/NAR/23.24.4992

Agnes Hotz-Wagenblatt, Thomas Hankeln, Peter Ernst, Karl-Heinz Glatting, Erwin R Schmidt, Sándor Suhai, ESTAnnotator: a tool for high throughput EST annotation Nucleic Acids Research. ,vol. 31, pp. 3716- 3719 ,(2003) , 10.1093/NAR/GKG566

10.

K. P. Micallef, M. Cooper, D. W. Podlich, Using clusters of computers for large QU-GENE simulation experiments. Bioinformatics. ,vol. 17, pp. 194- 195 ,(2001) , 10.1093/BIOINFORMATICS/17.2.194

SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

来源期刊

我的账户

SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

来源期刊

相似文章 10

我的账户