Data Troubles in Sentence Level Confidence Estimation for Machine Translation.

作者: Ciprian Chelba , Jeff Klingner , Junpei Zhou , Hideto Kazawa , Mengmeng Niu

DOI:

关键词:

摘要: The paper investigates the feasibility of confidence estimation for neural machine translation models operating at high end performance spectrum. As a side product data annotation process necessary building such we propose sentence level accuracy $SACC$ as simple, self-explanatory evaluation metric quality translation. Experiments on two different annotator pools, one comprised non-expert (crowd-sourced) and expert (professional) translators show that can vary greatly depending proficiency annotators, despite fact both pools are about equally reliable according to Krippendorff's alpha metric; relatively low values inter-annotator agreement confirm expectation sentence-level binary labeling $good$ / $needs\ work$ out context is very hard. For an English-Spanish model $SACC = 0.89$ pool derive estimate labels 0.5-0.6 translations in "in-domain" test set with 0.95 Precision. Switching decreases dramatically: $0.61$ English-Spanish, measured exact same above. This forces us lower CE point 0.9 Precision while correctly 0.20-0.25 data. We find surprising extent which depends used leads important recommendation wish make when tackling modeling practice: it critical match end-user desired domain demands annotators assigning training

参考文章(23)
Marco Turchi, Matteo Negri, Marcello Federico, Data-driven annotation of binary MT quality estimation corpora based on human post-editions Machine Translation. ,vol. 28, pp. 281- 308 ,(2014) , 10.1007/S10590-014-9162-Z
Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Sequence to Sequence Learning with Neural Networks neural information processing systems. ,vol. 27, pp. 3104- 3112 ,(2014)
John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis, Nicola Ueffing, Confidence estimation for machine translation Proceedings of the 20th international conference on Computational Linguistics - COLING '04. pp. 315- 321 ,(2004) , 10.3115/1220355.1220401
Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, John Makhoul, A Study of Translation Edit Rate with Targeted Human Annotation conference of the association for machine translation in the americas. pp. 223- 231 ,(2006)
Matteo Negri, Marcello Federico, Marco Turchi, Coping with the Subjectivity of Human Judgements in MT Quality Estimation workshop on statistical machine translation. pp. 240- 251 ,(2013)
Rebecca J. Passonneau, Bob Carpenter, The Benefits of a Model of Annotation Transactions of the Association for Computational Linguistics. ,vol. 2, pp. 311- 326 ,(2014) , 10.1162/TACL_A_00185
Marc'Aurelio Ranzato, Michael Auli, Myle Ott, David Grangier, Analyzing Uncertainty in Neural Machine Translation arXiv: Computation and Language. ,(2018)
Peter I. Frazier, A Tutorial on Bayesian Optimization arXiv: Machine Learning. ,(2018)
Rebecca Knowles, Philipp Koehn, Lightweight Word-Level Confidence Estimation for Neural Interactive Translation Prediction Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing. pp. 35- 40 ,(2018)