Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning

作者: Francois Mairesse , Filip Jurcicek , Milica Gasic , Blaise Thomson , Steve Young

DOI:

关键词: Artificial intelligenceSet (abstract data type)Generator (mathematics)Rank (computer programming)Statistical modelDynamic Bayesian networkNatural language processingActive learning (machine learning)Graphical modelActive learningMachine learningPhraseComputer science

摘要: Most previous work on trainable language generation has focused two paradigms: (a) using a statistical model to rank set of generated utterances, or (b) statistics inform the decision process. Both approaches rely existence handcrafted generator, which limits their scalability new domains. This paper presents Bagel, generator uses dynamic Bayesian networks learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that Bagel can generate natural and informative utterances unseen inputs in information presentation domain. Additionally, performance sparse datasets is improved significantly certainty-based active learning, yielding ratings close gold standard with fraction data.

参考文章(34)
David D. Lewis, Jason Catlett, Heterogeneous Uncertainty Sampling for Supervised Learning Machine Learning Proceedings 1994. pp. 148- 156 ,(1994) , 10.1016/B978-1-55860-335-6.50026-X
Yoav Freund, H. Sebastian Seung, Eli Shamir, Naftali Tishby, Selective Sampling Using the Query by Committee Algorithm Machine Learning. ,vol. 28, pp. 133- 168 ,(1997) , 10.1023/A:1007330508534
Andreas Stolcke, SRILM – An Extensible Language Modeling Toolkit conference of the international speech communication association. ,(2002)
François Mairesse, Jost Schatzmann, Milica Gasic, Blaise Thomson, Steve J. Young, Simon Keizer, Kai Yu, User study of the Bayesian Update of Dialogue State approach to dialogue management conference of the international speech communication association. pp. 483- 486 ,(2008)
G. Tur, R.E. Schapire, D. Hakkani-Tur, Active learning for spoken language understanding international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 276- 279 ,(2003) , 10.1109/ICASSP.2003.1198771
Verena Rieser, Oliver Lemon, Natural Language Generation as Planning under Uncertainty for Spoken Dialogue Systems Empirical Methods in Natural Language Generation. pp. 105- 120 ,(2010) , 10.1007/978-3-642-15573-4_6
Yulan He, Steve Young, Semantic processing using the Hidden Vector State model Computer Speech & Language. ,vol. 19, pp. 85- 106 ,(2005) , 10.1016/J.CSL.2004.03.001
Fabrice Lefevre, A DBN-BASED MULTI-LEVEL STOCHASTIC SPOKEN LANGUAGE UNDERSTANDING SYSTEM spoken language technology workshop. pp. 78- 81 ,(2006) , 10.1109/SLT.2006.326822
Sebastian Varges, Chris Mellish, Instance-based natural language generation Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 - NAACL '01. pp. 1- 8 ,(2001) , 10.3115/1073336.1073337