A bit-vector algorithm for computing Levenshtein and Damerau edit distances

作者: Heikki Hyyrö

DOI:

关键词:

摘要: The edit distance between strings A and B is defined as the minimum number of operations needed in converting into or vice versa. Levenshtein allows three types operations: an insertion, a deletion substitution character. Damerau previous plus addition transposition two adjacent characters. To our best knowledge current practical algorithms for computing these distances run time O(dm) O(⌈m/w⌉(n + σ)), where d strings, m n are their lengths (m ≤ n), w computer word size σ alphabet. In this paper we present algorithm that runs O(⌈d/w⌉m ⌈n/w⌉σ) O(⌈d/w⌉n ⌈m/w⌉σ). structure such, practice it mostly suitable testing whether within some pre-determined error threshold. We also initial test results with thresholded computation. them works faster than original Myers.

参考文章(10)
Heikki Hyyrö, Gonzalo Navarro, Faster Bit-Parallel Approximate String Matching combinatorial pattern matching. pp. 203- 224 ,(2002) , 10.1007/3-540-45452-7_18
V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)
Robert A. Wagner, Michael J. Fischer, The String-to-String Correction Problem Journal of the ACM. ,vol. 21, pp. 168- 173 ,(1974) , 10.1145/321796.321811
Esko Ukkonen, Algorithms for approximate string matching Information and Control. ,vol. 64, pp. 100- 118 ,(1985) , 10.1016/S0019-9958(85)80046-2
Esko Ukkonen, Finding approximate patterns in strings Journal of Algorithms. ,vol. 6, pp. 132- 137 ,(1985) , 10.1016/0196-6774(85)90023-9
Gonzalo Navarro, NR-grep: a fast and flexible pattern-matching tool Software - Practice and Experience. ,vol. 31, pp. 1265- 1312 ,(2001) , 10.1002/SPE.411
Sun Wu, Udi Manber, Fast text searching: allowing errors Communications of The ACM. ,vol. 35, pp. 83- 91 ,(1992) , 10.1145/135239.135244
Fred J. Damerau, A technique for computer detection and correction of spelling errors Communications of the ACM. ,vol. 7, pp. 171- 176 ,(1964) , 10.1145/363958.363994
Eugene W Myers, None, An O ( ND ) difference algorithm and its variations Algorithmica. ,vol. 1, pp. 251- 266 ,(1986) , 10.1007/BF01840446