Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System

作者: T. CINCAREK , H. KAWANAMI , R. NISIMURA , A. LEE , H. SARUWATARI

DOI: 10.1093/IETISY/E91-D.3.576

关键词:

摘要: In this paper, the development, long-term operation and portability of a practical ASR application in real environment is investigated. The target speech-oriented guidance system installed at local community center. has been exposed to ordinary people since November 2002. More than 300 hours or more 700,000 inputs have collected during four years. outcome rare example large scale real-environment speech database. A simulation experiment carried out with database investigate how system's performance improves first two years operation. purpose determine empirically amount data which be prepared build reasonable recognition response accuracy. Furthermore, relative importance developing main components, i.e. recognizer generation module, assessed. Although depending on modeling capacities domain complexity, experimental results show that overall stagnates after employing about 10-15 k utterances for training acoustic model, 40–50 language model 40 k–50 compiling question answer Q&A was most important improving Finally, well-trained prototype different environment, subway station, Since collection preparation amounts impractical general, only one month from new employed adaptation. While component high degree portability, accuracy lower environment. reason difference between systems, they are environments. This implicates it imperative take behavior users under conditions into account user satisfaction.

参考文章(21)
Karthik Visweswariah, Vaibhava Goel, Ramesh A. Gopinath, Task adaptation of acoustic and language models based on large quantities of data. conference of the international speech communication association. ,(2004)
Linda Bell, Joakim Gustafson, Interaction with an animated agent in a spoken dialogue system. conference of the international speech communication association. ,(1999)
Teresa M. Kamm, Gerard G. L. Meyer, Robustness aspects of active learning for acoustic modeling. conference of the international speech communication association. ,(2004)
Magnus Lundeberg, Joakim Gustafson, Nikolaj Lindberg, The August Spoken Dialogue System conference of the international speech communication association. pp. 1151- 1154 ,(1999)
Yuqing Gao, Liang Gu, Murat Akbacak, Hong-Kwang Jeff Kuo, Rapid transition to new spoken dialogue domains: language model training using knowledge from previous domain applications and web text resources. conference of the international speech communication association. pp. 1873- 1876 ,(2005)
Maxine Eskénazi, Brian Langner, Antoine Raux, Alan W. Black, Dan Bohus, Doing Research on a Deployed Spoken Dialogue System: One Year of Let's Go! Experience conference of the international speech communication association. ,(2006)
Ryuichi Nisimura, Akinobu Lee, Kiyohiro Shikano, Masashi Yamada, Operating A Public Spoken Guidance System In Real Environment conference of the international speech communication association. pp. 845- 848 ,(2005)
Andreas Stolcke, SRILM – An Extensible Language Modeling Toolkit conference of the international speech communication association. ,(2002)
Katunobu Itou, Mikio Yamamoto, Kazuya Takeda, Toshiyuki Takezawa, Tatsuo Matsuoka, Tetsunori Kobayashi, Kiyohiro Shikano, Shuichi Itahashi, JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. Journal of the Acoustical Society of Japan (E). ,vol. 20, pp. 199- 206 ,(1999) , 10.1250/AST.20.199
N. Gupta, G. Tur, D. Hakkani-Tur, S. Bangalore, G. Riccardi, M. Gilbert, The AT&T spoken language understanding system IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 14, pp. 213- 222 ,(2006) , 10.1109/TSA.2005.854085