作者: Shuang Hao , Nan Tang , Guoliang Li , Jian Li , Jianhua Feng
DOI: 10.1007/S00778-018-0506-9
关键词:
摘要: Given a relational table, we study the problem of detecting and repairing erroneous data, as well marking correct using curated knowledge bases (KBs). We propose detective rules (DRs), new type data cleaning that can make actionable decisions on by building connections between relation KB. The main invention is DR simultaneously models two opposite semantics an attribute belonging to types relationships in KB: positive explains how its value should be linked other values tuple, negative indicate wrong connected within same tuple. Naturally, mark tuple if it matches semantics. Meanwhile, detect/repair error fundamental problems associated with DRs, e.g., rule consistency implication. present efficient algorithms apply DRs clean relation, based order selection inverted indexes. Moreover, discuss approaches generate from examples. Extensive experiments, both real-world synthetic datasets, verify effectiveness efficiency applying practice.