A Data-driven Approach to Speech/Non-speech Detection

作者: Sree Hari Krishnan Parthasarathi , Hynek Hermansky , None

DOI:

关键词:

摘要: We present a data-driven approach to weighting the temporal context of signal energy be used in simple speech/non-speech detector (SND). The optimal weights are obtained using linear discriminant analysis (LDA). Regularization is performed handle numerical issues inherent usage correlated features. so interpreted as filter modulation spectral domain. Experimental evaluations on test data set, terms average frame-level error rate over different SNR levels, show that proposed method yields an absolute performance gain $10.9%$, $17.5%$, $7.9%$ and $8.3%$ ITU's G.729B, ETSI's AMR1, AMR2 state-of-the-art multi-layer perceptron based system, respectively. This shows even feature such full-band energy, when employed with large-enough context, promise for applications.

参考文章(7)
H. Hermansky, The modulation spectrum in the automatic recognition of speech ieee automatic speech recognition and understanding workshop. pp. 140- 147 ,(1997) , 10.1109/ASRU.1997.658998
Hari Krishna Maganti, Petr Motlicek, Daniel Gatica-Perez, Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 1037- 1040 ,(2007) , 10.1109/ICASSP.2007.367250
Jithendra Vepa, Thomas Hain, John Dines, The segmentation of multi-channel meeting recordings for automatic speech recognition conference of the international speech communication association. ,(2006)
N. Mesgarani, M. Slaney, S.A. Shamma, Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 14, pp. 920- 930 ,(2006) , 10.1109/TSA.2005.858055
B. Atal, L. Rabiner, A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 24, pp. 201- 212 ,(1976) , 10.1109/TASSP.1976.1162800
C. Lamblin, J.-P. Petit, A. Benyassine, E. Shlomot, D. Massaloux, H.-Y. Su, ITU-T recommendation G.729 Annex B : A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications : Standardization and characterization of G.729 IEEE Communications Magazine. ,vol. 35, pp. 64- 73 ,(1997)