A New Adaptive Long-Term Spectral Estimation Voice Activity Detector

作者: Ángel de la Torre , Javier Ramírez , M. Carmen Benítez , Antonio J. Rubio , José C. Segura

DOI:

关键词:

摘要: This paper shows an efficient voice activity detector (VAD) that is based on the estimation of long-term spectral diver- gence (LTSD) between noise and speech periods. The proposed method decomposes input signal into overlapped frames, uses a sliding window to compute spec- tral envelope measures speech/non-speech LTSD, thus yielding high discriminating decision rule minimizing average number errors. In order increase non- detection accuracy, threshold adapted measured energy while controlled hang-over ac- tivated only when observed signal-to-noise ratio (SNR) low. An exhaustive analysis VAD carried out using AURORA TIdigits SpeechDat-Car (SDC) databases. compared most com- monly used ones in field terms recognition performance. Experimental results demonstrate sustained advantage over G.729, AMR AFE VADs.

参考文章(14)
Vijay Madisetti, Douglas B. Williams, Digital Signal Processing Handbook CRC Press, Inc.. ,(1997)
Rainer Martin, An efficient algorithm to estimate the instantaneous SNR of speech signals. conference of the international speech communication association. ,(1993)
K. Itoh, M. Mizushima, Environmental noise reduction based on speech/non-speech identification for hearing aids international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 419- 422 ,(1997) , 10.1109/ICASSP.1997.599662
Régine Le Bouquin-Jeannès, Gérard Faucon, Study of a voice activity detector and its influence on a noise reduction system Speech Communication. ,vol. 16, pp. 245- 254 ,(1995) , 10.1016/0167-6393(94)00056-G
F. Beritelli, S. Casale, G. Ruggeri, S. Serrano, Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors IEEE Signal Processing Letters. ,vol. 9, pp. 85- 88 ,(2002) , 10.1109/97.995824
A. Martin, D. Charlet, L. Mauuary, Robust speech/non-speech detection using LDA applied to MFCC international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 237- 240 ,(2001) , 10.1109/ICASSP.2001.940811
A. Sangwan, M.C. Chiranth, H.S. Jamadagni, R. Sah, R. Venkatesha Prasad, V. Gaurav, VAD techniques for real-time speech transmission on the Internet international conference on communications. pp. 46- 50 ,(2002) , 10.1109/HSNMC.2002.1032545
S. Boll, Suppression of acoustic noise in speech using spectral subtraction IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 27, pp. 113- 120 ,(1979) , 10.1109/TASSP.1979.1163209
Jongseo Sohn, Nam Soo Kim, Wonyong Sung, A statistical model-based voice activity detection IEEE Signal Processing Letters. ,vol. 6, pp. 1- 3 ,(1999) , 10.1109/97.736233
D.K. Freeman, G. Cosier, C.B. Southcott, I. Boyd, The voice activity detector for the Pan-European digital cellular mobile telephone service international conference on acoustics, speech, and signal processing. pp. 369- 372 ,(1989) , 10.1109/ICASSP.1989.266442