作者: Sourav Dutta
DOI: 10.1007/978-3-319-16354-3_31
关键词:
摘要: Efficient extraction of strings or sub-strings similar to an input query string forms a necessity in applications like instant search, record linkage, etc., where the similarity between two is usually quantified by edit distance. This paper proposes novel top-k approximate sub-string matching algorithm, MIST, for given query, based on Chi-squared statistical significance triplets, thereby avoiding expensive distance computation. Experiments with real-life data validate run-time effectiveness and accuracy our algorithm.