Distilling relations using knowledge bases

作者: Shuang Hao , Nan Tang , Guoliang Li , Jian Li , Jianhua Feng

DOI: 10.1007/S00778-018-0506-9

关键词:

摘要: Given a relational table, we study the problem of detecting and repairing erroneous data, as well marking correct using curated knowledge bases (KBs). We propose detective rules (DRs), new type data cleaning that can make actionable decisions on by building connections between relation KB. The main invention is DR simultaneously models two opposite semantics an attribute belonging to types relationships in KB: positive explains how its value should be linked other values tuple, negative indicate wrong connected within same tuple. Naturally, mark tuple if it matches semantics. Meanwhile, detect/repair error fundamental problems associated with DRs, e.g., rule consistency implication. present efficient algorithms apply DRs clean relation, based order selection inverted indexes. Moreover, discuss approaches generate from examples. Extensive experiments, both real-world synthetic datasets, verify effectiveness efficiency applying practice.

参考文章(61)
Guoliang Li, A human-machine method for web table understanding web age information management. pp. 179- 189 ,(2013) , 10.1007/978-3-642-38562-9_19
Wenfei Fan, Zhe Fan, Chao Tian, Xin Luna Dong, Keys for graphs Proceedings of the VLDB Endowment. ,vol. 8, pp. 1590- 1601 ,(2015) , 10.14778/2824032.2824056
Fritz J. Scheuren, William E. Winkler, Thomas N. Herzog, Data Quality and Record Linkage Techniques ,(2007)
Victor Vianu, Serge Abiteboul, Richard Hull, Foundations of databases ,(1994)
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, Wei Zhang, From data fusion to knowledge fusion Proceedings of the VLDB Endowment. ,vol. 7, pp. 881- 892 ,(2014) , 10.14778/2732951.2732962
Matteo Interlandi, Nan Tang, Proof positive and negative in data cleaning international conference on data engineering. pp. 18- 29 ,(2015) , 10.1109/ICDE.2015.7113269
Mohamed Morsey, Jens Lehmann, Sören Auer, Axel-Cyrille Ngonga Ngomo, DBpedia SPARQL benchmark: performance assessment with real queries on real data international semantic web conference. pp. 454- 469 ,(2011) , 10.1007/978-3-642-25073-6_29
Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré, None, Incremental knowledge base construction using DeepDive Proceedings of the VLDB Endowment. ,vol. 8, pp. 1310- 1321 ,(2015) , 10.14778/2809974.2809991
Wenfei Fan, Xibei Jia, Jianzhong Li, Shuai Ma, Reasoning about record matching rules Proceedings of the VLDB Endowment. ,vol. 2, pp. 407- 418 ,(2009) , 10.14778/1687627.1687674
Guoliang Li, Dong Deng, Jiannan Wang, Jianhua Feng, Pass-join Proceedings of the VLDB Endowment. ,vol. 5, pp. 253- 264 ,(2011) , 10.14778/2078331.2078340