T-Crowd: Effective Crowdsourcing for Tabular Data

作者: Nikos Mamoulis , Yudian Zheng , Reynold Cheng , Guoliang Li , Zhipeng Huang

DOI:

关键词: Computer scienceTask (project management)InferenceSentiment analysisInformation retrievalSet (abstract data type)Order (business)Categorical variableData miningStar (graph theory)Crowdsourcing

摘要: Crowdsourcing employs human workers to solve computer-hard problems, such as data cleaning, entity resolution, and sentiment analysis. When crowdsourcing tabular data, e.g., the attribute values of an set, a worker's answers on different attributes (e.g., nationality age celebrity star) are often treated independently. This assumption is not always true can lead suboptimal performance. In this paper, we present T-Crowd system, which takes into consideration intricate relationships among tasks, in order converge faster their values. Particularly, integrates each effectively learn his/her trustworthiness The relationship information also used guide task allocation workers. Finally, seamlessly supports categorical continuous attributes, two main datatypes found typical databases. Our extensive experiments real synthetic datasets show that outperforms state-of-the-art methods terms truth inference reducing cost crowdsourcing.

参考文章(29)
A. P. Dawid, A. M. Skene, Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 28, pp. 20- 28 ,(1979) , 10.2307/2346806
Robert C. Miller, Samuel R. Madden, Eugene Wu, Adam Marcus, David R. Karger, Crowdsourced Databases: Query Processing with People conference on innovative data systems research. pp. 211- 214 ,(2011)
Rion Snow, Brendan O'Connor, Daniel Jurafsky, Andrew Y. Ng, Cheap and fast---but is it good? Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 254- 263 ,(2008) , 10.3115/1613715.1613751
Senjuti Basu Roy, Ioanna Lykourentzou, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das, Task assignment optimization in knowledge-intensive crowdsourcing very large data bases. ,vol. 24, pp. 467- 491 ,(2015) , 10.1007/S00778-015-0385-2
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, Jianhua Feng, iCrowd: An Adaptive Crowdsourcing Framework international conference on management of data. pp. 1015- 1030 ,(2015) , 10.1145/2723372.2750550
Rubi Boim, Ohad Greenshpan, Tova Milo, Slava Novgorodov, Neoklis Polyzotis, Wang-Chiew Tan, [IEEE 2012 IEEE International Conference on Data Engineering (ICDE 2012) - Arlington, VA, USA (2012.04.1-2012.04.5)] 2012 IEEE 28th International Conference on Data Engineering - Asking the Right Questions in Crowd Data Sourcing 2012 IEEE 28th International Conference on Data Engineering. pp. 1261- 1264 ,(2012) , 10.1109/ICDE.2012.122
Hyunjung Park, Hector Garcia-Molina, Richard Pang, Neoklis Polyzotis, Aditya Parameswaran, Jennifer Widom, Deco Proceedings of the VLDB Endowment. ,vol. 5, pp. 1990- 1993 ,(2012) , 10.14778/2367502.2367555
Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, Jiawei Han, Fenglong Ma, FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation knowledge discovery and data mining. pp. 745- 754 ,(2015) , 10.1145/2783258.2783314
Honglei Zhuang, Aditya Parameswaran, Dan Roth, Jiawei Han, Debiasing Crowdsourced Batches knowledge discovery and data mining. ,vol. 2015, pp. 1593- 1602 ,(2015) , 10.1145/2783258.2783316
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X