作者: Luisa Bentivogli , Yashar Mehdad , Danilo Giampiccolo , Alessandro Marchetti , Matteo Negri
DOI:
关键词:
摘要: We address the creation of cross-lingual textual entailment corpora by means crowd-sourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes manual work done expert annotators, without resorting preprocessing tools or already annotated monolingual datasets. In line with recent works emphasizing need large-scale annotation efforts for entailment, our aims to: i) tackle scarcity available train evaluate systems, ii) promote recourse crowdsourcing as an effective way reduce costs sacrificing quality. show complex task, which even experts usually feature low agreement scores, can be effectively decomposed into simple subtasks assigned non-expert annotators. The resulting dataset, obtained from pipeline different jobs routed Amazon Mechanical Turk, contains more than 1,600 aligned pairs each combination texts-hypotheses in English, Italian German.