作者: I. Shafran , R. Rose
DOI: 10.1109/ICASSP.2003.1198810
关键词: Nonparametric statistics 、 Speech recognition 、 Bandwidth (signal processing) 、 Voice activity detection 、 Pattern recognition 、 Background noise 、 Short-time Fourier transform 、 Computer science 、 Signal-to-noise ratio 、 Fourier transform 、 Artificial intelligence 、 Segmentation
摘要: This paper provides a solution for robust speech detection that can be applied across variety of tasks. The is based on an algorithm performs non-parametric estimation the background noise spectrum using minimum statistics smoothed short-time Fourier transform (STFT). It shown new operate effectively under varying signal-to-noise ratios. Results are reported two tasks - HMIHY and SPINE which differ in their speaking style, type bandwidth. With computational cost less than 2% real-time 1GHz P-3 machine latency 400 ms, it suitable ASR applications.