作者: Martti T Tammi , Erik Arner , Ellen Kindlund , Björn Andersson
DOI: 10.1093/NAR/GKG653;
关键词:
摘要: Sequencing errors in combination with repeated regions cause major problems shotgun sequencing, mainly due to the failure of assembly programs distinguish single base differences between repeat copies from erroneous calls. In this paper, a new strategy designed correct sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes sequencing by analyzing multiple alignments consisting read and all its overlaps other reads. construction performed novel pattern matching algorithm, which takes advantage symmetry indices that can be computed for similar words same length. This allows rapid alignments, no previous pair-wise reads required. Results C++ implementation show up 99% corrected, while 87% remain 80% corrected contain at most one error. results also outperforms error correction used EULER assembler. prototype software, MisEd, freely available authors academic use.