Rule-Based Entity Resolution on Database with Hidden Temporal Information

作者: Hongzhi Wang , Xiaoou Ding , Jianzhong Li , Hong Gao

DOI: 10.1109/TKDE.2018.2816018

关键词:

摘要: In this paper, we deal with the problem of rule-based entity resolution on imprecise temporal data. Entity (ER) is widely explored in research community, but data, especially without available timestamps, has not been studied well yet. Because elapsing time, records referring to same observed different time periods may be different. Besides traditional similarity-based ER approaches, by carefully exploring several data quality rules, e.g., matching dependency and currency, much information can obtained facilitate cope problem. use such rules derive records’ order trend their attributes’ evolvement time. Specifically, first block into smaller blocks, then currency constraints, propose a clustering approach two steps, i.e., skeleton banding clustering. Experimental results both real synthetic show that our method achieve high accuracy efficiency datasets hidden information.

参考文章(26)
Rohit Ananthakrishna, Surajit Chaudhuri, Venkatesh Ganti, Eliminating fuzzy duplicates in data warehouses very large data bases. pp. 586- 597 ,(2002) , 10.1016/B978-155860869-6/50058-5
Mauricio A. Hernández, Salvatore J. Stolfo, Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem Data Mining and Knowledge Discovery. ,vol. 2, pp. 9- 37 ,(1998) , 10.1023/A:1009761603038
Wenfei Fan, Xibei Jia, Jianzhong Li, Shuai Ma, Reasoning about record matching rules Proceedings of the VLDB Endowment. ,vol. 2, pp. 407- 418 ,(2009) , 10.14778/1687627.1687674
Vassilios S Verykios, Ahmed K Elmagarmid, Elias N Houstis, Automating the approximate record-matching process Information Sciences. ,vol. 126, pp. 83- 98 ,(2000) , 10.1016/S0020-0255(00)00013-X
Lingli Li, Jianzhong Li, Hong Gao, Rule-Based Method for Entity Resolution IEEE Transactions on Knowledge and Data Engineering. ,vol. 27, pp. 250- 263 ,(2015) , 10.1109/TKDE.2014.2320713
Nick Koudas, Sunita Sarawagi, Divesh Srivastava, Record linkage: similarity measures and algorithms international conference on management of data. pp. 802- 803 ,(2006) , 10.1145/1142473.1142599
V.S. Verykios, G.V. Moustakides, M.G. Elfeky, A Bayesian decision model for cost optimal record matching The VLDB Journal The International Journal on Very Large Data Bases. ,vol. 12, pp. 28- 40 ,(2003) , 10.1007/S00778-002-0072-Y
William W. Cohen, Integration of heterogeneous databases without common domains using queries based on textual similarity Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 201- 212 ,(1998) , 10.1145/276304.276323