TruthCore: Non-parametric estimation of truth from a collection of authoritative sources

作者: Tathagata Mukherjee , Biswas Parajuli , Piyush Kumar , Eduardo Pasiliao

DOI: 10.1109/BIGDATA.2016.7840696

关键词: Computer scienceGraph (abstract data type)Big dataData modelingData miningInformation retrievalData aggregator

摘要: Truth Finding is the problem of determining correct information from several conflicting sources and required for data aggregation. Existing algorithms solve by simultaneously estimating source qualities fact confidences, working on either numeric or non-numeric data. However, in practice, datasets are a mixture different types. In this work we present unified framework finding truth collection conflicting, authoritative sources. We assume that small subset independent reliable selected preprocessing step. formulate as an outlier removal problem, modeling similarities between values reported these Our algorithm works two stages: it first models similarity graph then finds invoking algorithm. report experiments including results fixing records Open Library; open, editable library catalog.

参考文章(23)
Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, Wei Zhang, Knowledge-based trust Proceedings of the VLDB Endowment. ,vol. 8, pp. 938- 949 ,(2015) , 10.14778/2777598.2777603
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, Divesh Srivastava, Truth finding on the deep web Proceedings of the VLDB Endowment. ,vol. 6, pp. 97- 108 ,(2012) , 10.14778/2535568.2448943
Xian Li, Xin Luna Dong, Kenneth B. Lyons, Weiyi Meng, Divesh Srivastava, Scaling up copy detection international conference on data engineering. pp. 89- 100 ,(2015) , 10.1109/ICDE.2015.7113275
Samidh Chatterjee, Bradley Neff, Piyush Kumar, Instant approximate 1-center on road networks via embeddings Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '11. pp. 369- 372 ,(2011) , 10.1145/2093973.2094025
John Dunagan, Santosh Vempala, Optimal outlier removal in high-dimensional Proceedings of the thirty-third annual ACM symposium on Theory of computing - STOC '01. pp. 627- 636 ,(2001) , 10.1145/380752.380860
R. Kannan, L. Lovász, M. Simonovits, Isoperimetric problems for convex bodies and a localization lemma Discrete & Computational Geometry. ,vol. 13, pp. 541- 559 ,(1995) , 10.1007/BF02574061
Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava, Integrating conflicting data Proceedings of the VLDB Endowment. ,vol. 2, pp. 550- 561 ,(2009) , 10.14778/1687627.1687690
Xiaoxin Yin, Jiawei Han, P.S. Yu, Truth Discovery with Multiple Conflicting Information Providers on the Web IEEE Transactions on Knowledge and Data Engineering. ,vol. 20, pp. 796- 808 ,(2008) , 10.1109/TKDE.2007.190745
Erich Leo Lehmann, Theory of point estimation ,(1950)