作者: Douglas Burdick , Robert Szczerba
DOI:
关键词:
摘要: A system learns a record similarity measurement. The includes set of clusters. Each in each cluster may have list fields and data contained field. further include predetermined threshold score for two the records one clusters to be considered similar at least decision tree constructed from portion encodes rules determining field related fields. an output pairs that are determined duplicate records. greater than or equal score.