作者: Zhaoqiang Chen , Qun Chen , Jiajun Li , Zhanhuai Li , Lei Chen
DOI: 10.1016/J.INS.2016.03.036
关键词:
摘要: Due to richness of information on web, there is an increasing interest search for missing attribute values in relational data web. Web-based imputation has first extract multiple candidate from web and then rank them by their matching probabilities. However, effective ranking remains challenging because documents are unstructured popular engines can only provide with relevant but not necessarily semantically information.In this paper, we propose a novel probabilistic approach the web-retrieved values. It integrate various influence factors, e.g. snippet order, occurrence frequency, pattern, keyword proximity, single framework semantic reasoning. The proposed consists model model. measures snippet, similarity between value tuple. We also present estimation solutions both models. Finally, empirically evaluate performance real datasets. Our extensive experiments demonstrate that it outperforms state-of-the-art techniques considerable margins accuracy.