Robust speech detection and segmentation for real-time ASR applications

DOI: 10.1109/ICASSP.2003.1198810

关键词: Nonparametric statistics 、 Speech recognition 、 Bandwidth (signal processing) 、 Voice activity detection 、 Pattern recognition 、 Background noise 、 Short-time Fourier transform 、 Computer science 、 Signal-to-noise ratio 、 Fourier transform 、 Artificial intelligence 、 Segmentation

摘要: This paper provides a solution for robust speech detection that can be applied across variety of tasks. The is based on an algorithm performs non-parametric estimation the background noise spectrum using minimum statistics smoothed short-time Fourier transform (STFT). It shown new operate effectively under varying signal-to-noise ratios. Results are reported two tasks - HMIHY and SPINE which differ in their speaking style, type bandwidth. With computational cost less than 2% real-time 1GHz P-3 machine latency 400 ms, it suitable ASR applications.

uni-trier.de 本地加速

washington.edu PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(9)

Murat Saraclar, Vincent Goffin, Michael Riley, Enrico Bocchieri, Towards automatic closed captioning : low latency real time broadcast news transcription. conference of the international speech communication association. ,(2002)

Qi Li, Jinsong Zheng, A. Tsai, Qiru Zhou, Robust endpoint detection and energy normalization for real-time speech and speaker recognition IEEE Transactions on Speech and Audio Processing. ,vol. 10, pp. 146- 157 ,(2002) , 10.1109/TSA.2002.1001979

D. Malah, R.V. Cox, A.J. Accardi, Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments international conference on acoustics speech and signal processing. ,vol. 2, pp. 789- 792 ,(1999) , 10.1109/ICASSP.1999.759789

Brian Kingsbury, George Saon, Lidia Mangu, Mukund Padmanabhan, Ruhi Sarikaya, Robust speech recognition in Noisy Environments: The 2001 IBM spine evaluation system international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 53- 56 ,(2002) , 10.1109/ICASSP.2002.5743652

R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics IEEE Transactions on Speech and Audio Processing. ,vol. 9, pp. 504- 512 ,(2001) , 10.1109/89.928915

Gorin Parker Sachs, AL Gorin, BA Parker, RM Sachs, JG Wilpon, How may I help you Speech Communication. ,vol. 23, pp. 113- 127 ,(1997) , 10.1016/S0167-6393(97)00040-X

Emil Julius Gumbel, Statistics of extremes ,(1958)

Dimitra Vergyri, Andreas Stolcke, Venkata Ramana Rao Gadde, M. Kemal Sönmez, Anand Venkataraman, Jing Zheng, Building an ASR System for Noisy Environments: SRI's 2001 SPINE Evaluation System conference of the international speech communication association. ,(2002)

Kingsbury, Saon, Mangu, Padmanabhan, Sarikaya, Robust speech recognition in noisy environments: the 2001 IBM SPINE evaluation system international conference on acoustics, speech, and signal processing. ,vol. 1, ,(2002) , 10.1109/ICASSP.2002.1005673

Robust speech detection and segmentation for real-time ASR applications

来源期刊

我的账户

Robust speech detection and segmentation for real-time ASR applications

来源期刊

相似文章 10

我的账户