Correcting errors in shotgun sequences

作者: Martti T Tammi , Erik Arner , Ellen Kindlund , Björn Andersson

DOI: 10.1093/NAR/GKG653;

关键词:

摘要: Sequencing errors in combination with repeated regions cause major problems shotgun sequencing, mainly due to the failure of assembly programs distinguish single base differences between repeat copies from erroneous calls. In this paper, a new strategy designed correct sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes sequencing by analyzing multiple alignments consisting read and all its overlaps other reads. construction performed novel pattern matching algorithm, which takes advantage symmetry indices that can be computed for similar words same length. This allows rapid alignments, no previous pair-wise reads required. Results C++ implementation show up 99% corrected, while 87% remain 80% corrected contain at most one error. results also outperforms error correction used EULER assembler. prototype software, MisEd, freely available authors academic use.

参考文章(11)
Marie-France Sagot, Spelling Approximate Repeated or Common Motifs Using a Suffix Tree latin american symposium on theoretical informatics. pp. 374- 390 ,(1998)
R. W. Hamming, Error detecting and error correcting codes Bell System Technical Journal. ,vol. 29, pp. 147- 160 ,(1950) , 10.1002/J.1538-7305.1950.TB00463.X
Gonzalo Navarro, A guided tour to approximate string matching ACM Computing Surveys. ,vol. 33, pp. 31- 88 ,(2001) , 10.1145/375360.375365
Martti T Tammi, Erik Arner, Björn Andersson, TRAP: Tandem Repeat Assembly Program produces improved shotgun assemblies of repetitive sequences. Computer Methods and Programs in Biomedicine. ,vol. 70, pp. 47- 59 ,(2003) , 10.1016/S0169-2607(01)00194-8
M. T. Tammi, E. Arner, T. Britton, B. Andersson, Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs Bioinformatics. ,vol. 18, pp. 379- 388 ,(2002) , 10.1093/BIOINFORMATICS/18.3.379
E. Eichler, Repetitive Conundrums of Centromere Structure and Function Human Molecular Genetics. ,vol. 8, pp. 151- 155 ,(1999) , 10.1093/HMG/8.2.151
Brent Ewing, Phil Green, Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities Genome Research. ,vol. 8, pp. 186- 194 ,(1998) , 10.1101/GR.8.3.186
Brent Ewing, LaDeana Hillier, Michael C. Wendl, Phil Green, Base-calling of automated sequencer traces using Phred. I. accuracy assessment Genome Research. ,vol. 8, pp. 175- 185 ,(1998) , 10.1101/GR.8.3.175
P. A. Pevzner, H. Tang, M. S. Waterman, An Eulerian path approach to DNA fragment assembly Proceedings of the National Academy of Sciences of the United States of America. ,vol. 98, pp. 9748- 9753 ,(2001) , 10.1073/PNAS.171285098
Serafim Batzoglou, David B Jaffe, Ken Stanley, Jonathan Butler, Sante Gnerre, Evan Mauceli, Bonnie Berger, Jill P Mesirov, Eric S Lander, ARACHNE: A Whole-Genome Shotgun Assembler Genome Research. ,vol. 12, pp. 177- 189 ,(2002) , 10.1101/GR.208902