Online Topic-Aware Entity Resolution Over Incomplete Data Streams (Technical Report).

作者: Kambiz Ghazinour , Xiang Lian , Weilong Ren

DOI:

关键词: Pruning (decision trees)Semantic WebJoinsTupleImputation (statistics)Data stream miningComputer scienceData integrationData miningData extraction

摘要: In many real applications such as the data integration, social network analysis, and Semantic Web, entity resolution (ER) is an important fundamental problem, which identifies links same real-world entities from various sources. While prior works usually consider ER over static complete data, in practice, application are collected a streaming fashion, often incur missing attributes (due to inaccuracy of extraction techniques). Therefore, this paper, we will formulate tackle novel topic-aware incomplete streams (TER-iDS), online imputes tuples detects pairs topic-related matching streams. order effectively efficiently TER-iDS propose effective imputation strategy, carefully design pruning strategies, well indexes/synopsis, develop efficient algorithm via index joins. Extensive experiments have been conducted evaluate effectiveness efficiency our proposed approach sets.

参考文章(41)
John W. Graham, Missing Data: Analysis and Design ,(2012)
Aldo Gangemi, A Comparison of Knowledge Extraction Tools for the Semantic Web extended semantic web conference. ,vol. 7882, pp. 351- 366 ,(2013) , 10.1007/978-3-642-38288-8_24
Hoifung Poon, Pedro Domingos, Joint inference in information extraction national conference on artificial intelligence. pp. 913- 918 ,(2007)
Hanna Köpcke, Andreas Thor, Erhard Rahm, Evaluation of entity resolution approaches on real-world match problems Proceedings of the VLDB Endowment. ,vol. 3, pp. 484- 493 ,(2010) , 10.14778/1920841.1920904
Furong Li, Mong Li Lee, Wynne Hsu, Wang-Chiew Tan, Linking Temporal Records for Profiling Entities international conference on management of data. pp. 593- 605 ,(2015) , 10.1145/2723372.2737789
Iosif Lazaridis, Sharad Mehrotra, Progressive approximate aggregate queries with a multi-resolution tree structure international conference on management of data. ,vol. 30, pp. 401- 412 ,(2001) , 10.1145/375663.375718
George Papadakis, Georgia Koutrika, Themis Palpanas, Wolfgang Nejdl, Meta-Blocking: Taking Entity Resolutionto the Next Level IEEE Transactions on Knowledge and Data Engineering. ,vol. 26, pp. 1946- 1960 ,(2014) , 10.1109/TKDE.2013.54
Shaoxu Song, Lei Chen, Differential dependencies ACM Transactions on Database Systems. ,vol. 36, pp. 1- 41 ,(2011) , 10.1145/2000824.2000826
Yufei Tao, Dimitris Papadias, Maintaining sliding window skylines on data streams IEEE Transactions on Knowledge and Data Engineering. ,vol. 18, pp. 377- 391 ,(2006) , 10.1109/TKDE.2006.48
Xin Luna Dong, Felix Naumann, Data fusion: resolving data conflicts for integration very large data bases. ,vol. 2, pp. 1654- 1655 ,(2009) , 10.14778/1687553.1687620