Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria

作者: Pei-Yun Hsueh , Prem Melville , Vikas Sindhwani

DOI: 10.3115/1564131.1564137

关键词: Natural language processingAmbiguityMachine learningSelection (linguistics)Artificial intelligenceComputer scienceThe InternetAnnotationData qualityCrowdsourcing

摘要: Annotation acquisition is an essential step in training supervised classifiers. However, manual annotation often time-consuming and expensive. The possibility of recruiting annotators through Internet services (e.g., Amazon Mechanic Turk) appealing option that allows multiple labeling tasks to be outsourced bulk, typically with low overall costs fast completion rates. In this paper, we consider the difficult problem classifying sentiment political blog snippets. data from both expert a research lab non-expert recruited are examined. Three selection criteria identified select high-quality annotations: noise level, ambiguity, lexical uncertainty. Analysis confirm utility these on improving quality. We conduct empirical study examine effect noisy annotations performance classification models, evaluate accuracy efficiency.

参考文章(21)
A. P. Dawid, A. M. Skene, Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 28, pp. 20- 28 ,(1979) , 10.2307/2346806
Janyce Wiebe, Ellen Riloff, Creating Subjective and Objective Sentence Classifiers from Unannotated Texts Computational Linguistics and Intelligent Text Processing. ,vol. 3406, pp. 486- 497 ,(2005) , 10.1007/978-3-540-30586-6_53
David D. Lewis, Jason Catlett, Heterogeneous Uncertainty Sampling for Supervised Learning Machine Learning Proceedings 1994. pp. 148- 156 ,(1994) , 10.1016/B978-1-55860-335-6.50026-X
Kamal Nigam, Andrew McCallum, A comparison of event models for naive bayes text classification national conference on artificial intelligence. pp. 41- 48 ,(1998)
Rion Snow, Brendan O'Connor, Daniel Jurafsky, Andrew Y. Ng, Cheap and fast---but is it good? Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 254- 263 ,(2008) , 10.3115/1613715.1613751
Richard D. Lawrence, Prem Melville, Wojciech Gryc, Sentiment analysis of blogs by combining lexical knowledge with text classification Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09. pp. 1275- 1284 ,(2009) , 10.1145/1557019.1557156
Giuseppe Carenini, Jackie Chi Kit Cheung, Extractive vs. NLG-based abstractive summarization of evaluative text: the effect of corpus controversiality international conference on natural language generation. pp. 33- 41 ,(2008) , 10.3115/1708322.1708330
Hong Yu, Vasileios Hatzivassiloglou, Towards answering opinion questions Proceedings of the 2003 conference on Empirical methods in natural language processing -. pp. 129- 136 ,(2003) , 10.3115/1119355.1119372
Bo Pang, Lillian Lee, A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts meeting of the association for computational linguistics. pp. 271- 278 ,(2004) , 10.3115/1218955.1218990