Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech.

作者: Amr H. Nour-Eldin , Peter Kabal

DOI:

关键词:

摘要: In this paper, we extend our previous work on exploiting speech temporal properties to improve Bandwidth Extension (BWE) of narrowband using Gaussian Mixture Models (GMMs). By quantifying through information theoretic measures and delta features, have shown that memory significantly increases certainty about highband parameters. However, as features are non-invertible, they can not be directly used reconstruct frequency content. the presented herein, embed indirectly into GMM structure a memorydependent tree-based approach representation narrow band. particular, sequences past frames progressively grow in tree-like fashion. This growth results reliable estimates for parameters such Maximum Likelihood estimation is no longer necessary, thus circumventing complexity accompanying high-dimensionality training.

参考文章(9)
Amr H. Nour-Eldin, Peter Kabal, Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech. conference of the international speech communication association. pp. 53- 56 ,(2008)
Hynek Hermansky, Sangita Sharma, TRAPS - classifiers of temporal patterns. conference of the international speech communication association. ,(1998)
Peter Jax, Peter Vary, On artificial bandwidth extension of telephone speech Signal Processing. ,vol. 83, pp. 1707- 1719 ,(2003) , 10.1016/S0165-1684(03)00082-3
Mattias Nilsson, Harald Gustaftson, Soren Vang Andersen, W. Bastiaan Kleijn, Gaussian mixture model based mutual information estimation between frequency bands in speech international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 525- 528 ,(2002) , 10.1109/ICASSP.2002.5743770
A.H. Nour-Eldin, T.Z. Shabestary, P. Kabal, The Effect of Memory Inclusion on Mutual Information Between Speech Frequency Bands international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 53- 56 ,(2006) , 10.1109/ICASSP.2006.1660588
Peter Jax, Peter Vary, An upper bound on the quality of artificial bandwidth extension of narrowband speech signals international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 237- 240 ,(2002) , 10.1109/ICASSP.2002.5743698
Amr H. Nour-Eldin, Peter Kabal, Combining frontend-based memory with MFCC features for Bandwidth Extension of narrowband speech international conference on acoustics, speech, and signal processing. pp. 4001- 4004 ,(2009) , 10.1109/ICASSP.2009.4960505
S. Greenberg, B.E.D. Kingsbury, The modulation spectrogram: in pursuit of an invariant representation of speech international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1647- 1650 ,(1997) , 10.1109/ICASSP.1997.598826
Kun-Youl Park, Hyung Soon Kim, Narrowband to wideband conversion of speech using GMM based transformation international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1843- 1846 ,(2000) , 10.1109/ICASSP.2000.862114