Cost-effective data annotation using game-based crowdsourcing

作者: Jingru Yang , Ju Fan , Zhewei Wei , Guoliang Li , Tongyu Liu

DOI: 10.14778/3275536.3275541

关键词:

摘要: Large-scale data annotation is indispensable for many applications, such as machine learning and integration. However, existing solutions either incur expensive cost large datasets or produce noisy results. This paper introduces a cost-effective approach, focuses on the labeling rule generation problem that aims to generate high-quality rules largely reduce while preserving quality. To address problem, we first candidate rules, then devise game-based crowdsourcing approach CROWDGAME select by considering coverage precision. employs two groups of crowd workers: one group answers validation tasks (whether valid) play role generator, other tuple checking annotated label correct) refuter. We let two-player game: generator identifies with precision, refuter tries refute its opponent some tuples provide enough evidence reject covering tuples. studies challenges in CROWDGAME. The balance trade-off between define loss factors. second precision estimation. utilize Bayesian estimation combine both tasks. third fulfill framework minimizing loss. introduce minimax strategy develop efficient task selection algorithms. conduct experiments entity matching relation extraction, results show our method outperforms state-of-the-art solutions.

参考文章(43)
Kilian Weinberger, Matt Kusner, Nicholas Kolkin, Yu Sun, From Word Embeddings To Document Distances international conference on machine learning. pp. 957- 966 ,(2015)
Patrick Hanks, Kenneth Ward Church, Word association norms, mutual information, and lexicography Computational Linguistics. ,vol. 16, pp. 22- 29 ,(1990) , 10.5555/89086.89095
F. Parisi, F. Strino, B. Nadler, Y. Kluger, Ranking and combining multiple predictors without labeled data Proceedings of the National Academy of Sciences of the United States of America. ,vol. 111, pp. 1253- 1258 ,(2014) , 10.1073/PNAS.1219097111
Ju Fan, Meihui Zhang, Stanley Kok, Meiyu Lu, Beng Chin Ooi, CrowdOp: Query Optimization for Declarative Crowdsourcing Systems IEEE Transactions on Knowledge and Data Engineering. ,vol. 27, pp. 2078- 2092 ,(2015) , 10.1109/TKDE.2015.2407353
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, Jianhua Feng, iCrowd: An Adaptive Crowdsourcing Framework international conference on management of data. pp. 1015- 1030 ,(2015) , 10.1145/2723372.2750550
Hyunjung Park, Hector Garcia-Molina, Richard Pang, Neoklis Polyzotis, Aditya Parameswaran, Jennifer Widom, Deco Proceedings of the VLDB Endowment. ,vol. 5, pp. 1990- 1993 ,(2012) , 10.14778/2367502.2367555
Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, Meihui Zhang, A hybrid machine-crowdsourcing system for matching web tables international conference on data engineering. pp. 976- 987 ,(2014) , 10.1109/ICDE.2014.6816716
Steven Euijong Whang, Peter Lofgren, Hector Garcia-Molina, Question selection for crowd entity resolution Proceedings of the VLDB Endowment. ,vol. 6, pp. 349- 360 ,(2013) , 10.14778/2536336.2536337
Adam Marcus, Eugene Wu, David R. Karger, Samuel Madden, Robert C. Miller, Demonstration of Qurk Proceedings of the 2011 international conference on Management of data - SIGMOD '11. pp. 1315- 1318 ,(2011) , 10.1145/1989323.1989486
Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, Jianhua Feng, QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications international conference on management of data. pp. 1031- 1046 ,(2015) , 10.1145/2723372.2749430