作者: Ying Xu , Venkatesh Ganti
DOI:
关键词:
摘要: A technique for probabilistic determining fuzzy duplicates includes converting a plurality of tuples into hash vectors utilizing locality sensitive hashing algorithm. The are sorted, on one or more vector coordinates, to cluster similar coordinate values together. Each two identifies candidate tuples. compared similarity function. Tuples which than specified threshold returned.