Modulation Spectral Features for Robust Far-Field Speaker Identification

作者: T.H. Falk , Wai-Yip Chan

DOI: 10.1109/TASL.2009.2023679

关键词:

摘要: In this paper, auditory inspired modulation spectral features are used to improve automatic speaker identification (ASI) performance in the presence of room reverberation. The signal representation is obtained by first filtering speech with a 23-channel gammatone filterbank. An eight-channel filterbank then applied temporal envelope each filter output. Features extracted from frequency bands ranging 3-15 H z and shown be robust mismatch between training testing conditions increasing reverberation levels. To demonstrate gains proposed features, experiments performed clean speech, artificially generated reverberant recorded meeting room. Simulation results show that Gaussian mixture model based ASI system, trained on consistently outperforms baseline system mel-frequency cepstral coefficients. For multimicrophone applications, three multichannel score combination adaptive channel selection techniques investigated further performance.

参考文章(39)
Tiago H. Falk, Wai-Yip Chan, Spectro-temporal features for robust far-field speaker identification. conference of the international speech communication association. pp. 634- 637 ,(2008)
Tiago H. Falk, Wai-Yip Chan, Hua Yuan, Spectro-temporal processing for blind estimation of reverberation time and single-ended quality measurement of reverberant speech. conference of the international speech communication association. pp. 514- 517 ,(2007)
Alex Waibel, Yue Pan, The effects of room acoustics on MFCC speech parameter. conference of the international speech communication association. pp. 129- 132 ,(2000)
T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes international conference on spoken language processing. ,vol. 4, pp. 2490- 2493 ,(1996) , 10.1109/ICSLP.1996.607318
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, C. Wooters, The ICSI Meeting Corpus international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 364- 367 ,(2003) , 10.1109/ICASSP.2003.1198793
A. Kusumoto, T. Arai, T. Kitamura, M. Takahashi, Y. Murahara, Modulation enhancement of speech as a preprocessing for reverberant chambers with the hearing-impaired international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 853- 856 ,(2000) , 10.1109/ICASSP.2000.859094
J. Gonzalez-Rodriguez, J. Ortega-Garcia, C. Martin, L. Hernandez, Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays international conference on spoken language processing. ,vol. 3, pp. 1333- 1336 ,(1996) , 10.1109/ICSLP.1996.607859
J. Ortega-Garcia, J. Gonzalez-Rodriguez, Overview of speech enhancement techniques for automatic speaker recognition international conference on spoken language processing. ,vol. 2, pp. 929- 932 ,(1996) , 10.1109/ICSLP.1996.607754
Rob Drullman, Joost M. Festen, Reinier Plomp, Effect of temporal envelope smearing on speech reception The Journal of the Acoustical Society of America. ,vol. 95, pp. 1053- 1064 ,(1994) , 10.1121/1.408467