A Data-driven Approach to Speech/Non-speech Detection

作者： Sree Hari Krishnan Parthasarathi , Hynek Hermansky , None

DOI:

关键词:

摘要: We present a data-driven approach to weighting the temporal context of signal energy be used in simple speech/non-speech detector (SND). The optimal weights are obtained using linear discriminant analysis (LDA). Regularization is performed handle numerical issues inherent usage correlated features. so interpreted as filter modulation spectral domain. Experimental evaluations on test data set, terms average frame-level error rate over different SNR levels, show that proposed method yields an absolute performance gain $10.9%$, $17.5%$, $7.9%$ and $8.3%$ ITU's G.729B, ETSI's AMR1, AMR2 state-of-the-art multi-layer perceptron based system, respectively. This shows even feature such full-band energy, when employed with large-enough context, promise for applications.

epfl.ch 本地加速

idiap.ch PDF 下载加速

参考文章(7)

H. Hermansky, The modulation spectrum in the automatic recognition of speech ieee automatic speech recognition and understanding workshop. pp. 140- 147 ,(1997) , 10.1109/ASRU.1997.658998

Hari Krishna Maganti, Petr Motlicek, Daniel Gatica-Perez, Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 1037- 1040 ,(2007) , 10.1109/ICASSP.2007.367250

Jithendra Vepa, Thomas Hain, John Dines, The segmentation of multi-channel meeting recordings for automatic speech recognition conference of the international speech communication association. ,(2006)

N. Mesgarani, M. Slaney, S.A. Shamma, Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 14, pp. 920- 930 ,(2006) , 10.1109/TSA.2005.858055

Keinosuke Fukunaga, Introduction to Statistical Pattern Recognition ,(1972)

B. Atal, L. Rabiner, A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 24, pp. 201- 212 ,(1976) , 10.1109/TASSP.1976.1162800

C. Lamblin, J.-P. Petit, A. Benyassine, E. Shlomot, D. Massaloux, H.-Y. Su, ITU-T recommendation G.729 Annex B : A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications : Standardization and characterization of G.729 IEEE Communications Magazine. ,vol. 35, pp. 64- 73 ,(1997)

A Data-driven Approach to Speech/Non-speech Detection

来源期刊

我的账户

A Data-driven Approach to Speech/Non-speech Detection

来源期刊

相似文章 0

我的账户