Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora

作者： Luisa Bentivogli , Yashar Mehdad , Danilo Giampiccolo , Alessandro Marchetti , Matteo Negri

DOI:

关键词:

摘要: We address the creation of cross-lingual textual entailment corpora by means crowd-sourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes manual work done expert annotators, without resorting preprocessing tools or already annotated monolingual datasets. In line with recent works emphasizing need large-scale annotation efforts for entailment, our aims to: i) tackle scarcity available train evaluate systems, ii) promote recourse crowdsourcing as an effective way reduce costs sacrificing quality. show complex task, which even experts usually feature low agreement scores, can be effectively decomposed into simple subtasks assigned non-expert annotators. The resulting dataset, obtained from pipeline different jobs routed Amazon Mechanical Turk, contains more than 1,600 aligned pairs each combination texts-hypotheses in English, Italian German.

uni-trier.de 本地加速

aclweb.org 本地加速

aclweb.org PDF 下载加速

aclanthology.org PDF 下载加速

参考文章(14)

Ido Dagan, Oren Glickman, PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY ,(2004)

Bernardo Magnini, Luisa Bentivogli, Ido Dagan, Elena Cabrio, Danilo Giampiccolo, Medea Lo Leggio, Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference language resources and evaluation. ,(2010)

Fabio Massimo Zanzotto, Johan Bos, Marco Pennacchiotti, Textual Entailment at EVALITA 2009 ,(2009)

Rion Snow, Brendan O'Connor, Daniel Jurafsky, Andrew Y. Ng, Cheap and fast---but is it good? Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 254- 263 ,(2008) , 10.3115/1613715.1613751

Rada Mihalcea, Carlo Strapparava, The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language meeting of the association for computational linguistics. pp. 309- 312 ,(2009) , 10.3115/1667583.1667679

Yashar Mehdad, Marcello Federico, Matteo Negri, Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment meeting of the association for computational linguistics. pp. 1336- 1345 ,(2011)

Chris Callison-Burch, Mark Dredze, Creating Speech and Language Data With Amazon's Mechanical Turk north american chapter of the association for computational linguistics. pp. 1- 12 ,(2010)

Chris Callison-Burch, Michael Bloodgood, Using Mechanical Turk to Build Machine Translation Evaluation Sets north american chapter of the association for computational linguistics. pp. 208- 211 ,(2010)

Chris Callison-Burch, Rui Wang, Cheap Facts and Counter-Facts north american chapter of the association for computational linguistics. pp. 163- 167 ,(2010)

10.

Sadaoki Furui, Joanna Mrozinski, Edward Whittaker, Collecting a Why-Question Corpus for Development and Evaluation of an Automatic QA-System meeting of the association for computational linguistics. pp. 443- 451 ,(2008)

Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora

来源期刊

我的账户

Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora

来源期刊

相似文章 10

我的账户