The impact of evaluation scenario development on the quantitative performance of speech translation systems prescribed by the SCORE framework

作者: Brian A. Weiss , Craig Schlenoff

DOI: 10.1145/1865909.1865957

关键词:

摘要: The Defense Advanced Research Projects Agency's (DARPA) Spoken Language Communication and Translation for Tactical Use (TRANSTAC) program is a focused advanced technology research development program. intent of this to demonstrate capabilities quickly develop implement free-form, two-way, speech-to-speech spoken language translation systems allowing speakers different languages communicate with each other in real-world tactical situations without the need an interpreter. National Institute Standards Technology (NIST), support from Mitre Corporation Appen Pty Limited, has been funded by DARPA evaluate TRANSTAC technologies since 2006. NIST-led Independent Evaluation Team (IET) numerous responsibilities ongoing effort including collecting processing training data, designing implementing performance evaluations, analyzing test data. In order design execute fair relevant NIST IET employed System, Component Operationally-Relevant (SCORE) framework. SCORE framework unified set criteria tools built around premise that, gain understanding how would perform its intended environment, it must be evaluated at both component system levels further tested operationally-relevant environments while capturing quantitative qualitative Since evaluation goal capture data technologies, developed implemented SCORE-inspired live scenarios. two forms scenarios have unique impacts on This paper presents methodology, as well their influence performance.

参考文章(10)
Michelle Potts Steves, Craig Schlenoff, Sherri L. Condon, Dan Parvaz, Brian A. Weiss, Jon Phillips, Gregory A. Sanders, Performance Evaluation of Speech Translation Systems. language resources and evaluation. ,(2008)
Brian A. Weiss, Marnie Menzel, Development of Domain-Specific Scenarios for Training and Evaluation of Two-Way, Free Form, Spoken Language Translation Devices International Test and Evaluation Association (ITEA) Journal. ,(2010)
S. Balakirsky, S. Carpin, G. Dimitoglou, B. Balaguer, From Simulation to Real Robots with Predictable Results: Methods and Examples Performance Evaluation and Benchmarking of Intelligent Systems. pp. 113- 137 ,(2009) , 10.1007/978-1-4419-0492-8_6
Craig Schlenoff, Michelle Potts Steves, Brian A. Weiss, Mike Shneier, Ann Virts, Applying SCORE to field‐based performance evaluations of soldier worn sensor technologies Journal of Field Robotics. ,vol. 24, pp. 671- 698 ,(2007) , 10.1002/ROB.20211
Stephen Balakirsky, Raj Madhavan, Advancing manufacturing research through competitions Unmanned Systems Technology XI. ,vol. 7332, ,(2009) , 10.1117/12.823719
Craig Schlenoff, Greg Sanders, Brian Weiss, Fred Proctor, Michelle Potts Steves, Ann Virts, Evaluating speech translation systems Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems - PerMIS '09. pp. 223- 230 ,(2009) , 10.1145/1865909.1865955
Brian A. Weiss, Craig Schlenoff, Evolution of the SCORE framework to enhance field-based performance evaluations of emerging technologies Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems - PerMIS '08. pp. 1- 8 ,(2008) , 10.1145/1774674.1774676
Michelle Potts Steves, Craig Schlenoff, Ann Virts, Brian A. Weiss, Mike Shneier, Applying SCORE to field-based performance evaluations of soldier worn sensor technologies: Field Reports Journal of Field Robotics. ,vol. 24, pp. 671- 698 ,(2007) , 10.1002/ROB.V24:8/9
Craig I. Schlenoff, Ann M. Virts, Brian A. Weiss, Michael O. Shneier, Technology Evaluations and Performance Metrics for Soldier-Worn Sensors for ASSIST | NIST performance metrics for intelligent systems. ,(2007)
Craig I. Schlenoff, Ann M. Virts, Brian A. Weiss, Michael Linegang, Michael O. Shneier, Michelle P. Steves, Overview of the First Advanced Technology Evaluations for ASSIST | NIST performance metrics for intelligent systems. ,(2006)