作者: Blaise Thomson , Steve Young
DOI: 10.1016/J.CSL.2009.07.003
关键词:
摘要: This paper describes a statistically motivated framework for performing real-time dialogue state updates and policy learning in spoken system. The is based on the partially observable Markov decision process (POMDP), which provides well-founded, statistical model of management. However, exact belief POMDP are computationally intractable so approximate methods must be used. presents tractable method loopy propagation algorithm. Various simplifications made, improve efficiency significantly compared to original algorithm as well other POMDP-based updating approaches. A second contribution this systems uses component-based with episodic Natural Actor Critic proposed was tested both simulations user trial. Both indicated that using Bayesian outperforms traditional definitions state. Policy worked effectively learned outperformed all others simulations. In trials also competitive, although its optimality less conclusive. Overall, update shown feasible effective approach building real-world systems.