DOCS

作者: Yudian Zheng , Guoliang Li , Reynold Cheng

DOI: 10.14778/3025111.3025118

关键词: Quality (business)Human–computer interactionLeverage (statistics)Knowledge baseComputer scienceDomain knowledgeDomain (software engineering)InferenceArtificial intelligenceTask (project management)Crowdsourcing

摘要: Crowdsourcing is a new computing paradigm that harnesses human effort to solve computer-hard problems, such as entity resolution and photo tagging. The crowd (or workers) have diverse qualities it important effectively model worker's quality. Most of existing worker models assume workers the same quality on different tasks. In practice, however, tasks belong variety domains, domains. For example, who basketball fan should better for task labeling related 'Stephen Curry' than one 'Leonardo DiCaprio'. this paper, we study how leverage domain knowledge accurately We examine using base (KB), e.g., Wikipedia Freebase, detect domains workers. develop Domain Vector Estimation, which analyzes with respect KB. also Truth Inference, utilizes domain-sensitive infer true answer task. design an Online Task Assignment algorithm, judiciously efficiently assigns appropriate To implement these solutions, built DOCS, system deployed Amazon Mechanical Turk. Experiments show DOCS performs much state-of-the-art approaches.

参考文章(45)
A. P. Dawid, A. M. Skene, Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 28, pp. 20- 28 ,(1979) , 10.2307/2346806
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, Jiawei Han, A Survey on Truth Discovery Sigkdd Explorations. ,vol. 17, pp. 1- 16 ,(2016) , 10.1145/2897350.2897352
Robert C. Miller, Samuel R. Madden, Eugene Wu, Adam Marcus, David R. Karger, Crowdsourced Databases: Query Processing with People conference on innovative data systems research. pp. 211- 214 ,(2011)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Wei Shen, Jianyong Wang, Jiawei Han, Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions IEEE Transactions on Knowledge and Data Engineering. ,vol. 27, pp. 443- 460 ,(2015) , 10.1109/TKDE.2014.2327028
S. Kullback, R. A. Leibler, On Information and Sufficiency Annals of Mathematical Statistics. ,vol. 22, pp. 79- 86 ,(1951) , 10.1214/AOMS/1177729694
Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye, KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing international conference on management of data. pp. 1247- 1261 ,(2015) , 10.1145/2723372.2749431
Yael Amsterdamer, Susan B. Davidson, Tova Milo, Slava Novgorodov, Amit Somech, OASSIS: query driven crowd mining international conference on management of data. pp. 589- 600 ,(2014) , 10.1145/2588555.2610514
Senjuti Basu Roy, Ioanna Lykourentzou, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das, Task assignment optimization in knowledge-intensive crowdsourcing very large data bases. ,vol. 24, pp. 467- 491 ,(2015) , 10.1007/S00778-015-0385-2
Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, Robert E. Tarjan, Time bounds for selection Journal of Computer and System Sciences. ,vol. 7, pp. 448- 461 ,(1973) , 10.1016/S0022-0000(73)80033-9