A Truth Discovery Approach with Theoretical Guarantee

作者: Houping Xiao , Jing Gao , Zhaoran Wang , Shiyu Wang , Lu Su

DOI: 10.1145/2939672.2939816

关键词:

摘要: In the information age, people can easily collect about same set of entities from multiple sources, among which conflicts are inevitable. This leads to an important task, truth discovery, i.e., identify true facts (truths) via iteratively updating truths and source reliability. However, convergence is never discussed in existing work, thus there no theoretical guarantee results these discovery approaches. contrast, this paper we propose a approach with guarantee. We randomized gaussian mixture model (RGMM) represent multi-source data, where parameters. incorporate bias captures its reliability degree into RGMM formulation. The task then modeled as seeking maximum likelihood estimate (MLE) truths. Based on expectation-maximization (EM) techniques, population-based (i.e., limit infinite data) sample-based finite samples) solutions for MLE. Theoretically, prove that both contractive e-ball around MLE, under certain conditions. Experimentally, evaluate our method simulated real-world datasets. Experimental show achieves high accuracy identifying

参考文章(26)
A. P. Dawid, A. M. Skene, Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 28, pp. 20- 28 ,(1979) , 10.2307/2346806
Chenyun Dai, Dan Lin, Elisa Bertino, Murat Kantarcioglu, An Approach to Evaluate Data Trustworthiness Based on Data Provenance very large data bases. pp. 82- 98 ,(2008) , 10.1007/978-3-540-85259-9_6
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, Divesh Srivastava, Truth finding on the deep web Proceedings of the VLDB Endowment. ,vol. 6, pp. 97- 108 ,(2012) , 10.14778/2535568.2448943
Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti, Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources Notes on Numerical Fluid Mechanics and Multidisciplinary Design. pp. 83- 97 ,(2010) , 10.1007/978-3-642-13094-6_8
Guo-Jun Qi, Charu C. Aggarwal, Jiawei Han, Thomas Huang, Mining collective intelligence in diverse groups Proceedings of the 22nd international conference on World Wide Web - WWW '13. pp. 1041- 1052 ,(2013) , 10.1145/2488388.2488479
Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil, People on drugs: credibility of user statements in health communities knowledge discovery and data mining. pp. 65- 74 ,(2014) , 10.1145/2623330.2623714
Richard A. Redner, Homer F. Walker, Mixture Densities, Maximum Likelihood and the EM Algorithm SIAM Review. ,vol. 26, pp. 195- 239 ,(1984) , 10.1137/1026034
Chuishi Meng, Wenjun Jiang, Yaliang Li, Jing Gao, Lu Su, Hu Ding, Yun Cheng, Truth Discovery on Crowd Sensing of Correlated Entities international conference on embedded networked sensor systems. pp. 169- 182 ,(2015) , 10.1145/2809695.2809715
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, Wei Zhang, Knowledge vault: a web-scale approach to probabilistic knowledge fusion knowledge discovery and data mining. pp. 601- 610 ,(2014) , 10.1145/2623330.2623623
Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, Jiawei Han, Fenglong Ma, FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation knowledge discovery and data mining. pp. 745- 754 ,(2015) , 10.1145/2783258.2783314