Vector representations for efficient comparison and search for similar strings

作者: A. M. Sokolov

DOI: 10.1007/S10559-007-0075-1

关键词:

摘要: A method is proposed for approximation of the classic edit distance between strings. The based on a mapping strings into vectors belonging to space with an easily calculable metric. preserves closeness and makes it possible accelerate computation distances. developed q-gram distances its two randomized versions improves quality in comparison well-known results.

参考文章(14)
Piotr Indyk, Aristides Gionis, Rajeev Motwani, Similarity Search in High Dimensions via Hashing very large data bases. pp. 518- 529 ,(1999)
V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)
Robert A. Wagner, Michael J. Fischer, The String-to-String Correction Problem Journal of the ACM. ,vol. 21, pp. 168- 173 ,(1974) , 10.1145/321796.321811
Gonzalo Navarro, A guided tour to approximate string matching ACM Computing Surveys. ,vol. 33, pp. 31- 88 ,(2001) , 10.1145/375360.375365
Esko Ukkonen, Approximate string-matching with q-grams and maximal matches Theoretical Computer Science. ,vol. 92, pp. 191- 211 ,(1992) , 10.1016/0304-3975(92)90143-4
Eyal Kushilevitz, Rafail Ostrovsky, Yuval Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces symposium on the theory of computing. pp. 614- 623 ,(1998) , 10.1145/276698.276877
D. A. Rachkovskii, S. V. Slipchenko, E. M. Kussul', T. N. Baidyk, A Binding Procedure for Distributed Binary Data Representations Cybernetics and Systems Analysis. ,vol. 41, pp. 319- 331 ,(2005) , 10.1007/S10559-005-0066-Z
Allan Borodin, Rafail Ostrovsky, Yuval Rabani, Lower bounds for high dimensional nearest neighbor search and related problems Proceedings of the thirty-first annual ACM symposium on Theory of computing - STOC '99. pp. 312- 321 ,(1999) , 10.1145/301250.301330
Piotr Indyk, Rajeev Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality symposium on the theory of computing. pp. 604- 613 ,(1998) , 10.1145/276698.276876
Mayur Datar, Nicole Immorlica, Piotr Indyk, Vahab S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions symposium on computational geometry. pp. 253- 262 ,(2004) , 10.1145/997817.997857