The challenge of spoken language systems: Research directions for the nineties

作者: R. Cole , L. Hirschman , L. Atlas , M. Beckman , A. Biermann

DOI: 10.1109/89.365385

关键词: Computer scienceAdaptation (computer science)Artificial intelligenceLanguage technologySpeech processingHuman interface deviceNatural language processingSpoken languageMeaning (linguistics)Speech synthesisHuman–computer interactionNatural language

摘要: A spoken language system combines speech recognition, natural processing and human interface technology. It functions by recognizing the person's words, interpreting sequence of words to obtain a meaning in terms application, providing an appropriate response back user. Potential applications systems range from simple tasks, such as retrieving information existing database (traffic reports, airline schedules), interactive problem solving tasks involving complex planning reasoning (travel planning, traffic routing), support for multilingual interactions. We examine eight key areas which basic research is needed produce systems: (1) robust recognition; (2) automatic training adaptation; (3) spontaneous speech; (4) dialogue models; (5) generation; (6) synthesis (7) systems; (8) multimodal systems. In each area, we identify challenges, infrastructure research, expected benefits. conclude reviewing need multidisciplinary development shared corpora related resources, computational far rapid communication among researchers. The successful this technology will increase accessibility computers wide users, facilitate multinational trade, create new specialties jobs rapidly expanding area. >

参考文章(130)
Christine Pao, Lynette Hirschman, The cost of errors in a spoken language system. conference of the international speech communication association. ,(1993)
Mark A. Fanty, Ronald A. Cole, John Pochmara, An interactive environment for speech recognition research. conference of the international speech communication association. ,(1992)
Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)
Hans-Wilhelm Rühl, Hans-Günter Hirsch, Peter Meyer, Improved speech recognition using high-pass filtering of subband envelopes. conference of the international speech communication association. ,(1991)
Stephanie Seneff, A joint synchrony/mean-rate model of auditory speech processing Journal of Phonetics. ,vol. 16, pp. 101- 111 ,(1990) , 10.1016/S0095-4470(19)30466-8
James Emil Flege, Laryngeal timing and phonation onset in utterance-initial English stops Journal of Phonetics. ,vol. 10, pp. 177- 192 ,(1982) , 10.1016/S0095-4470(19)30956-8
Philip R. Cohen, Sharon L. Oviatt, The role of voice in human-machine communication Voice communication between humans and machines. pp. 34- 75 ,(1994)
Björn Gambäck, Vassilios Digalakis, Jussi Karlgren, Christer Samuelsson, Jaan Kaja, Stephen G. Pulman, Patti Price, Manny Rayner, Michael Collins, Ivan Bretan, Bertil Lyberg, David M. Carter, Spoken language translation with MID-90's technology: a case study. conference of the international speech communication association. ,(1993)
Hynek Hermansky, Phil Kohn, Nelson Morgan, Aruna Bayya, Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP). conference of the international speech communication association. ,(1991)