作者: Kambiz Ghazinour , Xiang Lian , Weilong Ren
DOI:
关键词: Pruning (decision trees) 、 Semantic Web 、 Joins 、 Tuple 、 Imputation (statistics) 、 Data stream mining 、 Computer science 、 Data integration 、 Data mining 、 Data extraction
摘要: In many real applications such as the data integration, social network analysis, and Semantic Web, entity resolution (ER) is an important fundamental problem, which identifies links same real-world entities from various sources. While prior works usually consider ER over static complete data, in practice, application are collected a streaming fashion, often incur missing attributes (due to inaccuracy of extraction techniques). Therefore, this paper, we will formulate tackle novel topic-aware incomplete streams (TER-iDS), online imputes tuples detects pairs topic-related matching streams. order effectively efficiently TER-iDS propose effective imputation strategy, carefully design pruning strategies, well indexes/synopsis, develop efficient algorithm via index joins. Extensive experiments have been conducted evaluate effectiveness efficiency our proposed approach sets.