Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

作者: Francois Mairesse , Filip Jurcicek , Milica Gasic , Blaise Thomson , Steve Young

DOI:

关键词:

摘要: Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables policy robust to speech understanding errors be learnt. However, major challenge in POMDP learning is maintain tractability, so the use of approximation inevitable. We propose applying Gaussian Processes Reinforcement optimal policies, order (1) make process faster and (2) obtain an estimate uncertainty approximation. first demonstrate idea on simple voice mail task then apply this method real-world tourist information task.

参考文章(9)
Thomas Glen Dietterich, Adaptive computation and machine learning MIT Press. ,(1998)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Ronen I. Brafman, A heuristic variable grid solution method for POMDPs national conference on artificial intelligence. pp. 727- 733 ,(1997)
Christopher K I Williams, Carl Edward Rasmussen, Gaussian Processes for Machine Learning ,(2005)
Steve Young, Milica Gašić, Simon Keizer, François Mairesse, Jost Schatzmann, Blaise Thomson, Kai Yu, The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management Computer Speech & Language. ,vol. 24, pp. 150- 174 ,(2010) , 10.1016/J.CSL.2009.04.001
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Marc Peter Deisenroth, Carl Edward Rasmussen, Jan Peters, Gaussian process dynamic programming Neurocomputing. ,vol. 72, pp. 1508- 1524 ,(2009) , 10.1016/J.NEUCOM.2008.12.019
Yaakov Engel, Shie Mannor, Ron Meir, Reinforcement learning with Gaussian processes Proceedings of the 22nd international conference on Machine learning - ICML '05. pp. 201- 208 ,(2005) , 10.1145/1102351.1102377