A Weighted Discrete KNN Method for Mandarin Speech and Emotion Recognition

作者: Yu-Te Che , Tsang-Long Pao , Wen-Yuan Liao

DOI: 10.5772/6370

关键词:

摘要: Speech signal is a rich source of information and convey more than spoken words, can be divided into two main groups: linguistic nonlinguistic. The aspects speech include the properties word sequence deal with what being said. nonlinguistic have to do talker attributes such as age, gender, dialect, emotion how it Cues also provided in non-speech vocalizations, laught or cry. investigated this article were those audio-visual speech. In conversation, true meaning communication transmitted not only by content but something said, words are emphasized speaker’s attitude toward perception vocal expressions others vital for an accurate understanding emotional messages (Banse & Scherer, 1996). following, we will introduce recognition recognition, which applications our proposed weighted discrete K-nearest-neighbor (WD-KNN) method speech, respectively. consists steps, feature extraction recognition. chapter, methods system. post-processing, different classifiers weighting schemes on KNN-based recognitions discussed overall structure system depicted Fig. 1. briefly previous researches

参考文章(34)
Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)
Jun-Heng Yeh, Yuan-Hao Chang, Tsang-Long Pao, Yu-Te Chen, Emotion Recognition and Evaluation of Mandarin Speech Using Weighted D-KNN Classification international conference on computational linguistics. pp. 203- 212 ,(2005)
J. Matas, K. Messer, J. Kittler, Gilbert Maître, Juergen Luettin, XM2VTSDB: The Extended M2VTS Database Proc. Second International Conference on Audio- and Video-based Biometric Person Authentication (AVBPA'99). ,(1999)
Robert A. Granat, A method of hidden Markov model optimization for use with geophysical data sets international conference on computational science. pp. 892- 901 ,(2003) , 10.1007/3-540-44863-2_88
Gamze Erten, Audio visual speech processing ,(2001)
Satoshi Nakamura, Fusion of Audio-Visual Information for Integrated Speech Processing Lecture Notes in Computer Science. pp. 127- 143 ,(2001) , 10.1007/3-540-45344-X_20
Daniel Neiberg, Kjell Elenius, Kornel Laskowski, Emotion Recognition in Spontaneous Speech Using GMMs international conference on spoken language processing. pp. 809- 812 ,(2006)
E. Yamamoto, S. Nakamura, K. Shikano, Lip movement synthesis from speech based on hidden Markov models ieee international conference on automatic face and gesture recognition. pp. 154- 159 ,(1998) , 10.1109/AFGR.1998.670941
Iain R. Murray, John L. Arnott, Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion Journal of the Acoustical Society of America. ,vol. 93, pp. 1097- 1108 ,(1993) , 10.1121/1.405558
K Chan, J Hao, O W Kwon, EMOTION RECOGNITION BY SPEECH SIGNAL conference of the international speech communication association. pp. 125- 128 ,(2003)