Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task

作者: George Kokkinakis , Nikos Fakotakis , Todor Ganchev

DOI:

关键词:

摘要: Making no claim of being exhaustive, a review the most popular MFCC (Mel Frequency Cepstral Coefficients) implementations is made. These differ mainly in particular approximation nonlinear pitch perception human, filter bank design, and compression output. Then, comparative evaluation presented performed on task text-independent speaker verification, by means well-known 2001 NIST SRE (speaker recognition evaluation) one-speaker detection database.

参考文章(6)
Gunnar Fant, Speech sounds and features ,(1973)
Mark D. Skowronski, John G. Harris, Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition The Journal of the Acoustical Society of America. ,vol. 116, pp. 1774- 1780 ,(2004) , 10.1121/1.1777872
Brian C. J. Moore, Brian R. Glasberg, Suggested formulae for calculating auditory‐filter bandwidths and excitation patterns Journal of the Acoustical Society of America. ,vol. 74, pp. 750- 753 ,(1983) , 10.1121/1.389861
Fang Zheng, Guoliang Zhang, Zhanjiang Song, Comparison of different implementations of MFCC Journal of Computer Science and Technology. ,vol. 16, pp. 582- 589 ,(2001) , 10.1007/BF02943243
S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 28, pp. 65- 74 ,(1980) , 10.1109/TASSP.1980.1163420