Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale

作者: Lucia Specia , Ozan Caglayan , Pranava Madhyastha

DOI:

关键词: UsabilityImage (mathematics)Artificial intelligenceYield (finance)Natural language processingClosed captioningMachine translationKnow-howTest setComputer science

摘要: … of language generation systems is a well-studied problem in Natural Language Processing. … important failure cases on multiple datasets, language pairs and tasks. Our experiments …

参考文章(19)
Andrej Karpathy, Li Fei-Fei, Deep visual-semantic alignments for generating image descriptions computer vision and pattern recognition. pp. 3128- 3137 ,(2015) , 10.1109/CVPR.2015.7298932
Ramakrishna Vedantam, C. Lawrence Zitnick, Devi Parikh, CIDEr: Consensus-based image description evaluation computer vision and pattern recognition. pp. 4566- 4575 ,(2015) , 10.1109/CVPR.2015.7299087
Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, BLEU Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02. pp. 311- 318 ,(2001) , 10.3115/1073083.1073135
Michael Denkowski, Alon Lavie, Meteor Universal: Language Specific Translation Evaluation for Any Target Language Proceedings of the Ninth Workshop on Statistical Machine Translation. pp. 376- 380 ,(2014) , 10.3115/V1/W14-3348
Chin-Yew Lin, ROUGE: A Package for Automatic Evaluation of Summaries meeting of the association for computational linguistics. pp. 74- 81 ,(2004)
Ryu Iida, Tsutomu Hirao, Manabu Okumura, Katsumasa Yoshikawa, Sentence Compression with Semantic Role Constraints meeting of the association for computational linguistics. ,vol. 2, pp. 349- 353 ,(2012)
David Chen, William B Dolan, Collecting Highly Parallel Data for Paraphrase Evaluation meeting of the association for computational linguistics. pp. 190- 200 ,(2011)
Lucia Specia, Stella Frank, Khalil Sima'an, Desmond Elliott, A Shared Task on Multimodal Machine Translation and Crosslingual Image Description Proceedings of the First Conference on Machine Translation: Volume 2,#N# Shared Task Papers. ,vol. 2, pp. 543- 553 ,(2016) , 10.18653/V1/W16-2346
Colin Cherry, George Foster, Pierre Isabelle, A Challenge Set Approach to Evaluating Machine Translation arXiv: Computation and Language. ,(2017)