Robust angry speech detection employing a TEO-based discriminative classifier combination.

作者: John H. L. Hansen , Wooil Kim

DOI:

关键词:

摘要: This study proposes an effective angry speech detection approach employing the TEO-based feature extraction. Decorrelation processing is applied to increase model training ability by decreasing correlation between elements and vector size. Minimum classification error employed discrimination other stressed models. Combination with conventional Mel frequency cepstral coefficients (MFCC) also leverage effectiveness of MFCC characterize spectral envelope signals. Experimental results over SUSAS corpus demonstrate proposed scheme at increasing accuracy on open-speaker open-vocabulary task. An improvement up 7.78% in obtained combination methods including decorrelation vector, discriminative training, classifier combination.

参考文章(12)
Michael Koenig, James Meyerhoff, John H. L. Hansen, George Saviolakis, Mandar A. Rahurkar, Frequency band analysis for stress detection using a teager energy operator based feature conference of the international speech communication association. ,(2002)
John H. L. Hansen, Sahar E. Bou-Ghazale, Getting started with SUSAS: a speech under simulated and actual stress database. conference of the international speech communication association. ,(1997)
John H. L. Hansen, Sanjay A. Patil, Detection of speech under physical stress: Model development, sensor selection, and feature fusion conference of the international speech communication association. pp. 817- 820 ,(2008)
J.F. Kaiser, On a simple algorithm to calculate the 'energy' of a signal International Conference on Acoustics, Speech, and Signal Processing. pp. 381- 384 ,(1990) , 10.1109/ICASSP.1990.115702
R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, J.G. Taylor, Emotion recognition in human-computer interaction IEEE Signal Processing Magazine. ,vol. 18, pp. 32- 80 ,(2001) , 10.1109/79.911197
Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps, Empirical mode decomposition based weighted frequency feature for speech-based emotion classification international conference on acoustics, speech, and signal processing. pp. 5017- 5020 ,(2008) , 10.1109/ICASSP.2008.4518785
H. Teager, Some observations on oral air flow during phonation IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 28, pp. 599- 601 ,(1980) , 10.1109/TASSP.1980.1163453
G. Zhou, J.H.L. Hansen, J.F. Kaiser, Nonlinear feature based classification of speech under stress IEEE Transactions on Speech and Audio Processing. ,vol. 9, pp. 201- 216 ,(2001) , 10.1109/89.905995
J.H.L. Hansen, L. Gavidia-Ceballos, J.F. Kaiser, A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment IEEE Transactions on Biomedical Engineering. ,vol. 45, pp. 300- 313 ,(1998) , 10.1109/10.661155