Efficient Q-Gram Filters for Finding All Epsilon-Matches Over a Given Length

作者: Kim R. Rasmussen , Jens Stoye , Eugene W. Myers

DOI: 10.1089/CMB.2006.13.296

关键词:

摘要: Fast and exact comparison of large genomic sequences remains a challenging task in biosequence analysis. We consider the problem finding all epsilon-matches between two sequences, i.e., local alignments over given length with an error rate at most epsilon. study this theoretically, giving efficient q-gram filter for solving it. Two applications are also discussed, particular sequence assembly BLAST-like comparison. Our results show that method is 25 times faster than BLAST, while not being heuristic.

参考文章(12)
David Eppstein, Zvi Galil, Raffaele Giancarlo, Giuseppe F. Italiano, Sparse dynamic programming I Journal of the ACM. ,vol. 39, pp. 519- 545 ,(1992) , 10.1145/146637.146650
W. R. Pearson, D. J. Lipman, Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 85, pp. 2444- 2448 ,(1988) , 10.1073/PNAS.85.8.2444
Esko Ukkonen, Approximate string-matching with q-grams and maximal matches Theoretical Computer Science. ,vol. 92, pp. 191- 211 ,(1992) , 10.1016/0304-3975(92)90143-4
W. I. Chang, E. L. Lawler, Sublinear approximate string matching and biological applications Algorithmica. ,vol. 12, pp. 327- 344 ,(1994) , 10.1007/BF01185431
Gene Myers, Richard Durbin, A table-driven, full-sensitivity similarity search algorithm. Journal of Computational Biology. ,vol. 10, pp. 103- 117 ,(2003) , 10.1089/106652703321825919
T.F. Smith, M.S. Waterman, Identification of common molecular subsequences. Journal of Molecular Biology. ,vol. 147, pp. 195- 197 ,(1981) , 10.1016/0022-2836(81)90087-5
B. Ma, J. Tromp, M. Li, PatternHunter: faster and more sensitive homology search Bioinformatics. ,vol. 18, pp. 440- 445 ,(2002) , 10.1093/BIOINFORMATICS/18.3.440
Ning ZeMin Ning ZeMin, AJ Cox, JC Mullikin, SSAHA: A Fast Search Method for Large DNA Databases Genome Research. ,vol. 11, pp. 1725- 1729 ,(2001) , 10.1101/GR.194201