Efficient Entity Resolution on Heterogeneous Records

作者： Yiming Lin , Hongzhi Wang , Jianzhong Li , Hong Gao

DOI: 10.1109/TKDE.2019.2898191

关键词: Data mining 、 Schema matching 、 Data integration 、 Information retrieval 、 Computer science 、 Data exchange 、 Schema (psychology)

摘要: Entity resolution (ER) is the problem of identifying and merging records that refer to same real-world entity. In many scenarios, raw are stored under heterogeneous environment. Specifically, schemas may differ from each other. To leverage such better, most existing work assume schema matching data exchange have been done convert different those a predefined schema. However, we observe would lose information in some cases, which could be useful or even crucial ER. sufficient sources, this paper, address several challenges ER on show none similarity metrics their transformations applied find similar settings. Motivated by this, design function propose novel framework iteratively Regarding efficiency, build an index generate candidates accelerate computation. Evaluations datasets effectiveness efficiency our methods.

uni-trier.de 本地加速

ieee.org 本地加速

sci-hub.se PDF 下载加速

参考文章(25)

Wei Wang, Similarity Join Algorithms: An Introduction. SEBD. pp. 2- ,(2008)

Mayank Kejriwal, Daniel P. Miranker, An unsupervised instance matcher for schema-free RDF data Journal of Web Semantics. ,vol. 35, pp. 102- 123 ,(2015) , 10.1016/J.WEBSEM.2015.07.002

Douglas Brent West, Introduction to Graph Theory ,(1995)

Vassilis Christophides, Kostas Stefanidis, Vasilis Efthymiou, Melanie Herschel, Entity Resolution in the Web of Data ,(2015)

Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani, None, SIGMa: simple greedy matching for aligning large knowledge bases knowledge discovery and data mining. pp. 572- 580 ,(2013) , 10.1145/2487575.2487592

George Papadakis, Ekaterini Ioannou, Claudia Niederée, Peter Fankhauser, Efficient entity resolution for large heterogeneous information spaces web search and data mining. pp. 535- 544 ,(2011) , 10.1145/1935826.1935903

Nick Koudas, Sunita Sarawagi, Divesh Srivastava, Record linkage: similarity measures and algorithms international conference on management of data. pp. 802- 803 ,(2006) , 10.1145/1142473.1142599

Jiannan Wang, Tim Kraska, Michael J. Franklin, Jianhua Feng, CrowdER Proceedings of the VLDB Endowment. ,vol. 5, pp. 1483- 1494 ,(2012) , 10.14778/2350229.2350263

Nir Ailon, Moses Charikar, Alantha Newman, Aggregating inconsistent information Journal of the ACM. ,vol. 55, pp. 1- 27 ,(2008) , 10.1145/1411509.1411513

10.

Christoph Böhm, Gerard de Melo, Felix Naumann, Gerhard Weikum, LINDA Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12. pp. 2104- 2108 ,(2012) , 10.1145/2396761.2398582

Efficient Entity Resolution on Heterogeneous Records

来源期刊

我的账户

Efficient Entity Resolution on Heterogeneous Records

来源期刊

相似文章 4

Efficient Entity Resolution on Heterogeneous Records

Hierarchical Matching Network for Heterogeneous Entity Resolution

SDLER: stacked dedupe learning for entity resolution in big data era

The Four Generations of Entity Resolution

我的账户