作者: Tathagata Mukherjee , Biswas Parajuli , Piyush Kumar , Eduardo Pasiliao
DOI: 10.1109/BIGDATA.2016.7840696
关键词: Computer science 、 Graph (abstract data type) 、 Big data 、 Data modeling 、 Data mining 、 Information retrieval 、 Data aggregator
摘要: Truth Finding is the problem of determining correct information from several conflicting sources and required for data aggregation. Existing algorithms solve by simultaneously estimating source qualities fact confidences, working on either numeric or non-numeric data. However, in practice, datasets are a mixture different types. In this work we present unified framework finding truth collection conflicting, authoritative sources. We assume that small subset independent reliable selected preprocessing step. formulate as an outlier removal problem, modeling similarities between values reported these Our algorithm works two stages: it first models similarity graph then finds invoking algorithm. report experiments including results fixing records Open Library; open, editable library catalog.