Opportunities for re-convergence of engineering and cognitive science accounts of spoken word recognition

作者: Mark Huckvale

DOI:

关键词:

摘要: As users are only too aware, contemporary large vocabulary speech recognition systems do not respond to in the same way as humans. The dictation that use today very sensitive disfluencies, restarts, background noise and change of speaker or voice quality. Furthermore mistakes they make seem be different ones humans even when listening poor environments. There is no doubt will become more comfortable act like a human listener. This should mean scientific knowledge about how process relevant important design these systems. Unlike situation early days field, it now case research into processing language has diverged from We have separate independent fields ‘psycholinguistics’ ‘spoken engineering’. article explores relationship between engineering cognitive science communities within relatively well-defined sub-field spoken word recognition. That we shall mainly concerned with processes by which sequences recovered acoustic input. three parts: roots divergence accounts explored first part. Differences motivation, methodology culture all seen play part historical context. second discusses potential benefits re-convergence two argues time ripe for progress now. Engineering stable successful enough worth interpreting terms, while sophisticated allow useful comparisons undertaken. final proposes some elements joint programme could stimulus work together. Highlighted priming phenomena relate recent adaptation, morphological problems selection use. Other possibilities phonetic reduction at low end, semantic grouping phrasing high end both machine

参考文章(22)
P. S. Gopalakrishnan, L. R. Bahl, Fast Match Techniques Springer, Boston, MA. pp. 413- 428 ,(1996) , 10.1007/978-1-4613-1367-0_17
Dennis H. Klatt, Speech perception: a model of acoustic–phonetic analysis and lexical access Journal of Phonetics. ,vol. 7, pp. 279- 312 ,(1979) , 10.1016/S0095-4470(19)31059-9
Gerry T. M. Altmann, Cognitive models of speech processing: an introduction Cognitive models of speech processing. pp. 1- 23 ,(1991)
J. Wolf, W. Woods, The HWIM speech understanding system international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 784- 787 ,(1977) , 10.1109/ICASSP.1977.1170283
Lalit R. Bahl, Frederick Jelinek, Robert L. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. PAMI-5, pp. 179- 190 ,(1983) , 10.1109/TPAMI.1983.4767370
Hervé Bourlard, Hynek Hermansky, Nelson Morgan, Towards increasing speech recognition error rates Speech Communication. ,vol. 18, pp. 205- 231 ,(1996) , 10.1016/0167-6393(96)00003-9
L. D. Erman, F. Hayes‐Roth, V. R. Lesser, R. Reddy, The Hearsay‐II speech understanding system Journal of the Acoustical Society of America. ,vol. 60, ,(1976) , 10.1121/1.2003139
J. R. Pierce, Whither Speech Recognition? The Journal of the Acoustical Society of America. ,vol. 46, pp. 1049- 1051 ,(1969) , 10.1121/1.1911801