Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale

作者： Lucia Specia , Ozan Caglayan , Pranava Madhyastha

DOI:

关键词: Usability 、 Image (mathematics) 、 Artificial intelligence 、 Yield (finance) 、 Natural language processing 、 Closed captioning 、 Machine translation 、 Know-how 、 Test set 、 Computer science

摘要: … of language generation systems is a well-studied problem in Natural Language Processing. … important failure cases on multiple datasets, language pairs and tasks. Our experiments …

harvard.edu 本地加速

arxiv.org 本地加速

arxiv.org PDF 下载加速

参考文章(19)

Andrej Karpathy, Li Fei-Fei, Deep visual-semantic alignments for generating image descriptions computer vision and pattern recognition. pp. 3128- 3137 ,(2015) , 10.1109/CVPR.2015.7298932

Ramakrishna Vedantam, C. Lawrence Zitnick, Devi Parikh, CIDEr: Consensus-based image description evaluation computer vision and pattern recognition. pp. 4566- 4575 ,(2015) , 10.1109/CVPR.2015.7299087

Ehud Reiter, Anja Belz, An investigation into the validity of some metrics for automatically evaluating natural language generation systems Computational Linguistics. ,vol. 35, pp. 529- 558 ,(2009) , 10.1162/COLI.2009.35.4.35405

Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, BLEU Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02. pp. 311- 318 ,(2001) , 10.3115/1073083.1073135

Michael Denkowski, Alon Lavie, Meteor Universal: Language Specific Translation Evaluation for Any Target Language Proceedings of the Ninth Workshop on Statistical Machine Translation. pp. 376- 380 ,(2014) , 10.3115/V1/W14-3348

Chin-Yew Lin, ROUGE: A Package for Automatic Evaluation of Summaries meeting of the association for computational linguistics. pp. 74- 81 ,(2004)

Ryu Iida, Tsutomu Hirao, Manabu Okumura, Katsumasa Yoshikawa, Sentence Compression with Semantic Role Constraints meeting of the association for computational linguistics. ,vol. 2, pp. 349- 353 ,(2012)

David Chen, William B Dolan, Collecting Highly Parallel Data for Paraphrase Evaluation meeting of the association for computational linguistics. pp. 190- 200 ,(2011)

Lucia Specia, Stella Frank, Khalil Sima'an, Desmond Elliott, A Shared Task on Multimodal Machine Translation and Crosslingual Image Description Proceedings of the First Conference on Machine Translation: Volume 2,#N# Shared Task Papers. ,vol. 2, pp. 543- 553 ,(2016) , 10.18653/V1/W16-2346

10.

Colin Cherry, George Foster, Pierre Isabelle, A Challenge Set Approach to Evaluating Machine Translation arXiv: Computation and Language. ,(2017)

Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale

来源期刊

我的账户

Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale

来源期刊

相似文章 0

我的账户