作者: David Chen , William B Dolan
DOI:
关键词: Data mining 、 Paraphrase 、 Artificial intelligence 、 Computer science 、 Measure (data warehouse) 、 Simple (philosophy) 、 Scale (map) 、 Natural language processing 、 Field (computer science) 、 Machine translation
摘要: A lack of standard datasets and evaluation metrics has prevented the field paraphrasing from making kind rapid progress enjoyed by machine translation community over last 15 years. We address both problems presenting a novel data collection framework that produces highly parallel text relatively inexpensively on large scale. The nature this allows us to use simple n-gram comparisons measure semantic adequacy lexical dissimilarity paraphrase candidates. In addition being efficient compute, experiments show these correlate with human judgments.