作者: Pei-Yun Hsueh , Prem Melville , Vikas Sindhwani
关键词: Natural language processing 、 Ambiguity 、 Machine learning 、 Selection (linguistics) 、 Artificial intelligence 、 Computer science 、 The Internet 、 Annotation 、 Data quality 、 Crowdsourcing
摘要: Annotation acquisition is an essential step in training supervised classifiers. However, manual annotation often time-consuming and expensive. The possibility of recruiting annotators through Internet services (e.g., Amazon Mechanic Turk) appealing option that allows multiple labeling tasks to be outsourced bulk, typically with low overall costs fast completion rates. In this paper, we consider the difficult problem classifying sentiment political blog snippets. data from both expert a research lab non-expert recruited are examined. Three selection criteria identified select high-quality annotations: noise level, ambiguity, lexical uncertainty. Analysis confirm utility these on improving quality. We conduct empirical study examine effect noisy annotations performance classification models, evaluate accuracy efficiency.