作者: Yiming Lin , Hongzhi Wang , Jianzhong Li , Hong Gao
DOI: 10.1109/TKDE.2019.2898191
关键词: Data mining 、 Schema matching 、 Data integration 、 Information retrieval 、 Computer science 、 Data exchange 、 Schema (psychology)
摘要: Entity resolution (ER) is the problem of identifying and merging records that refer to same real-world entity. In many scenarios, raw are stored under heterogeneous environment. Specifically, schemas may differ from each other. To leverage such better, most existing work assume schema matching data exchange have been done convert different those a predefined schema. However, we observe would lose information in some cases, which could be useful or even crucial ER. sufficient sources, this paper, address several challenges ER on show none similarity metrics their transformations applied find similar settings. Motivated by this, design function propose novel framework iteratively Regarding efficiency, build an index generate candidates accelerate computation. Evaluations datasets effectiveness efficiency our methods.