Training a statistical surface realiser from automatic slot labelling

作者: Heriberto Cuayahuitl , Nina Dethlefs , Helen Hastie , Xingkun Liu

DOI: 10.1109/SLT.2014.7078559

关键词: Similarity (geometry)Quality (business)Training setSurface (mathematics)Natural language processingOnline learningUnlabelled dataFunction (mathematics)Artificial intelligenceComputer sciencePattern recognitionLabelling

摘要: Training a statistical surface realiser typically relies on labelled training data or parallel sets, such as corpora of paraphrases. The procedure for obtaining new domains is not only time-consuming, but it also restricts the incorporation semantic slots during an interaction, i.e. using online learning scenario automatically extended domains. Here, we present alternative approach to realisation from unlabelled through automatic slot labelling. essence our algorithm cluster clauses based similarity function that combines lexical and information. Annotations need be reliable enough utilised within spoken dialogue system. We compare different functions evaluate realiser—trained data—in human rating study. Results confirm trained labels can lead outputs comparable quality human-labelled inputs.

参考文章(33)
Benjamin Snyder, Regina Barzilay, Database-text alignment via structured multilabel classification international joint conference on artificial intelligence. pp. 1713- 1718 ,(2007)
Gabor Angeli, Percy Liang, Dan Klein, A Simple Domain-Independent Probabilistic Approach to Generation empirical methods in natural language processing. pp. 502- 512 ,(2010)
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
Brian Stankiewicz, Benjamin Kuipers, Matt MacMahon, Walk the talk: connecting language, knowledge, and action in route instructions national conference on artificial intelligence. pp. 1475- 1482 ,(2006)
Nina Dethlefs, Heriberto Cuayáhuitl, Hierarchical reinforcement learning for adaptive text generation international conference on natural language generation. pp. 37- 45 ,(2010)
Mirella Lapata, Ioannis Konstas, Unsupervised Concept-to-text Generation with Hypergraphs north american chapter of the association for computational linguistics. pp. 752- 761 ,(2012)
Nathan Schneider, Desai Chen, Dipanjan Das, Noah A. Smith, Probabilistic Frame-Semantic Parsing north american chapter of the association for computational linguistics. pp. 948- 956 ,(2010)
Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer, Feature-rich part-of-speech tagging with a cyclic dependency network Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 173- 180 ,(2003) , 10.3115/1073445.1073478
Gregory F. Cooper, Edward Herskovits, A Bayesian Method for the Induction of Probabilistic Networks from Data Machine Learning. ,vol. 9, pp. 309- 347 ,(1992) , 10.1023/A:1022649401552
Yun-Nung Chen, William Yang Wang, Alexander I. Rudnicky, Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing ieee automatic speech recognition and understanding workshop. pp. 120- 125 ,(2013) , 10.1109/ASRU.2013.6707716