作者: Francois Mairesse , Filip Jurcicek , Milica Gasic , Blaise Thomson , Steve Young
DOI:
关键词:
摘要: Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables policy robust to speech understanding errors be learnt. However, major challenge in POMDP learning is maintain tractability, so the use of approximation inevitable. We propose applying Gaussian Processes Reinforcement optimal policies, order (1) make process faster and (2) obtain an estimate uncertainty approximation. first demonstrate idea on simple voice mail task then apply this method real-world tourist information task.