Robust speech detection and segmentation for real-time ASR applications

作者: I. Shafran , R. Rose

DOI: 10.1109/ICASSP.2003.1198810

关键词: Nonparametric statisticsSpeech recognitionBandwidth (signal processing)Voice activity detectionPattern recognitionBackground noiseShort-time Fourier transformComputer scienceSignal-to-noise ratioFourier transformArtificial intelligenceSegmentation

摘要: This paper provides a solution for robust speech detection that can be applied across variety of tasks. The is based on an algorithm performs non-parametric estimation the background noise spectrum using minimum statistics smoothed short-time Fourier transform (STFT). It shown new operate effectively under varying signal-to-noise ratios. Results are reported two tasks - HMIHY and SPINE which differ in their speaking style, type bandwidth. With computational cost less than 2% real-time 1GHz P-3 machine latency 400 ms, it suitable ASR applications.

参考文章(9)
Murat Saraclar, Vincent Goffin, Michael Riley, Enrico Bocchieri, Towards automatic closed captioning : low latency real time broadcast news transcription. conference of the international speech communication association. ,(2002)
Qi Li, Jinsong Zheng, A. Tsai, Qiru Zhou, Robust endpoint detection and energy normalization for real-time speech and speaker recognition IEEE Transactions on Speech and Audio Processing. ,vol. 10, pp. 146- 157 ,(2002) , 10.1109/TSA.2002.1001979
D. Malah, R.V. Cox, A.J. Accardi, Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments international conference on acoustics speech and signal processing. ,vol. 2, pp. 789- 792 ,(1999) , 10.1109/ICASSP.1999.759789
Brian Kingsbury, George Saon, Lidia Mangu, Mukund Padmanabhan, Ruhi Sarikaya, Robust speech recognition in Noisy Environments: The 2001 IBM spine evaluation system international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 53- 56 ,(2002) , 10.1109/ICASSP.2002.5743652
R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics IEEE Transactions on Speech and Audio Processing. ,vol. 9, pp. 504- 512 ,(2001) , 10.1109/89.928915
Gorin Parker Sachs, AL Gorin, BA Parker, RM Sachs, JG Wilpon, How may I help you Speech Communication. ,vol. 23, pp. 113- 127 ,(1997) , 10.1016/S0167-6393(97)00040-X
Emil Julius Gumbel, Statistics of extremes ,(1958)
Dimitra Vergyri, Andreas Stolcke, Venkata Ramana Rao Gadde, M. Kemal Sönmez, Anand Venkataraman, Jing Zheng, Building an ASR System for Noisy Environments: SRI's 2001 SPINE Evaluation System conference of the international speech communication association. ,(2002)
Kingsbury, Saon, Mangu, Padmanabhan, Sarikaya, Robust speech recognition in noisy environments: the 2001 IBM SPINE evaluation system international conference on acoustics, speech, and signal processing. ,vol. 1, ,(2002) , 10.1109/ICASSP.2002.1005673