Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines

作者: Kalina Bontcheva , Marta Sabou , Leon Derczynski , Arno Scharl

DOI:

关键词:

摘要: Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range other linguistic resources. Although use this intensifying in all its key genres (paid-for crowdsourcing, games with purpose, volunteering-based approaches), community still lacks set best-practice guidelines similar to annotation best practices traditional, expert-based corpus acquisition. In paper we focus on crowdsourcing methods propose practice based our own experiences area overview related literature. We also introduce GATE Crowd, plugin platform relies these offers tool support using more principled efficient manner.

参考文章(51)
Maxine Eskenazi, Gabriel Parent, Clustering dictionary definitions using Amazon Mechanical Turk north american chapter of the association for computational linguistics. pp. 21- 29 ,(2010)
Kalina Bontcheva, Leon Derczynski, Ian Roberts, Crowdsourcing Named Entity Recognition and Entity Linking Corpora Springer, Dordrecht. pp. 875- 892 ,(2017) , 10.1007/978-94-024-0881-2_32
Noah A. Smith, Michael Heilman, Rating Computer-Generated Questions with Mechanical Turk north american chapter of the association for computational linguistics. pp. 35- 40 ,(2010)
Beno^it Sagot, Karën Fort, Influence of Pre-Annotation on POS-Tagged Corpus Development linguistic annotation workshop. pp. 56- 63 ,(2010)
Meliha Yetisgen-Yildiz, Nolan Lawson, Mike Perkowitz, Kevin Eustice, Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk north american chapter of the association for computational linguistics. pp. 71- 79 ,(2010)
Ian McGraw, Andrew M. Sutherland, Alexander Gruenstein, A self-labeling speech corpus: collecting spoken words with an online educational game. conference of the international speech communication association. pp. 3031- 3034 ,(2009)
Marta R. Costa-Jussà, Rafael Banchs, Jens Grivolla, Francesc Benavent, Joan Codina, Bart Mellebeek, Opinion Mining of Spanish Customer Comments with Non-Expert Annotations on Mechanical Turk north american chapter of the association for computational linguistics. pp. 114- 121 ,(2010)
Philip Resnik, Tae Yano, Noah A. Smith, Shedding (a Thousand Points of) Light on Biased Language north american chapter of the association for computational linguistics. pp. 152- 158 ,(2010)
Eiichiro Sumita, Kyo Kageura, Masao Utiyama, Takeshi Abekawa, Community-based Construction of Draft and Final Translation Corpus Through a Translation Hosting Site Minna no Hon’yaku (MNH) language resources and evaluation. ,(2010)
Udo Kruschwitz, M-Dyaa Albakour, Mahmoud El-Haj, Ahmet Aker, Assessing Crowdsourcing Quality through Objective Tasks language resources and evaluation. pp. 1456- 1461 ,(2012)