Comparative Evaluation and Shared Tasks for NLG in Interactive Systems

作者: Anja Belz , Helen Hastie

DOI: 10.1017/CBO9780511844492.013

关键词: Speech technologyComputer scienceAutomatic summarizationNatural language generationDocument retrievalArtificial intelligenceReferring expression generationNatural language processingParsingLanguage modelReferring expression

摘要: Introduction Natural Language Generation (NLG) has strong evaluation traditions, in particular the area of user NLG-based application systems, as conducted for example M-PIRO (Isard et al ., 2003), COMIC (Foster and White, 2005), SumTime (Reiter Belz, 2009) projects. There are also examples embedded NLG components compared to non-NLG baselines, including, e.g., DIAG (Di Eugenio 2002), STOP 2003b), SkillSum (Williams Reiter, 2008) evaluations, different versions same component, ILEX (Cox 1999), SPoT (Rambow 2001), CLASSiC (Janarthanam 2011) Starting with Langkilde Knight's work (Knight Langkilde, 2000), automatic against reference texts began be used, especially surface realization. What was missing, until 2006, were comparative results directly comparable, but independently developed, systems. In 1981, Sparck Jones wrote that information retrieval (IR) lacked consolidation ability progress collectively, this substantially because there no commonly agreed framework describing evaluating systems (Sparck Jones, p. 245). Since then, various sub-disciplines natural language processing (NLP) speech technology have consolidated progressed collectively through developing common task definitions frameworks, context shared-task campaigns (STECs), achieved successful commercial deployment a range technologies (e.g. recognition software, document retrieval, dialogue systems).

参考文章(88)
Michael J. Trolio, Barbara Di Eugenio, Michael Glass, The DIAG experiments: Natural Language Generation for Intelligent Tutoring Systems international conference on natural language generation. pp. 120- 127 ,(2002)
Oliver Lemon, Srini Janarthanam, The GRUVE Challenge: Generating Routes under Uncertainty in Virtual Environments natural language generation. pp. 208- 211 ,(2011)
Somayajulu G. Sripada, Ian Davy, Jin Yu, Wni Oceanroutes, SUMTIME-METEO: Parallel Corpus of Naturally Occurring Forecast Texts and Weather Data ,(2008)
Maxine Eskenazi, Antoine Raux, Diane J. Litman, Hua Ai, Dan Bohus, Comparing Spoken Dialog Corpora Collected with Recruited Subjects versus Real Users annual meeting of the special interest group on discourse and dialogue. pp. 124- 131 ,(2007)
Alexander I. Rudnicky, Rebecca J. Passonneau, Bryan L. Pellom, Alexandros Potamianos, Elizabeth Owen Bratt, David Stallard, John S. Aberdeen, Marilyn A. Walker, Gregory A. Sanders, Salim Roukos, Helen Wright Hastie, Rashmi Prasad, John S. Garofolo, Audrey N. Le, Stephanie Seneff, DARPA communicator: Cross-system results for the 2001 evaluation conference of the international speech communication association. ,(2002)
Alexander I. Rudnicky, Rebecca J. Passonneau, Bryan L. Pellom, Alexandros Potamianos, Elizabeth Owen Bratt, David Stallard, John S. Aberdeen, Marilyn A. Walker, Gregory A. Sanders, Salim Roukos, Helen Wright Hastie, Rashmi Prasad, John S. Garofolo, Audrey N. Le, Stephanie Seneff, DARPA Communicator Evaluation: Progress from 2000 to 2001 conference of the international speech communication association. ,(2002)
Hélène Bonneau-Maynard, Laurence Devillers, Sophie Rosset, Predictive Performance of Dialog Systems language resources and evaluation. ,(2000)