Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

作者: Ahmed Krobba , Mohamed Debyeche , Sid-Ahmed Selouani

DOI: 10.1007/S11042-020-08748-2

关键词: Speech recognitionWord error rateCepstrumGaussianLinear predictionAdditive white Gaussian noiseNoiseMicrophoneMel-frequency cepstrumRayleigh fadingComputer science

摘要: In this paper, we present a Mixture Linear Prediction based approach for robust Gammatone Cepstral Coefficients extraction (MLPGCCs). The proposed method provides performance improvement of Automatic Speaker Verification (ASV) using i-vector and Gaussian Probabilistic Discriminant Analysis GPLDA modeling under transmission channel noise. the extracted MLPGCCs was evaluated NIST 2008 database where single microphone recorded conversational speech. system is analyzed in presence different noises such as Additive White (AWGN) Rayleigh fading at various Signals to Noise Ratio (SNR) levels. evaluation results show that features are promising way ASV task. Indeed, speaker verification significantly improved compared conventional Frequency (GFCCs) Mel (MFCCs) features. For speech signals corrupted with AWGN noise SNRs ranging from (-5 dB 15 dB), obtain significant reduction Equal Error Rate (EER) 9.41% 6.65% 3.72% 1.50%, MFCCs GFCCs respectively. addition, when test achieve an EER 23.63% 7.8% 10.88% 6.8% GFCCs, We also found combination gives highest system. best achieved around 0.43% 0.59% 1.92% 3.88%.

参考文章(35)
Thomas F. Quatieri, Douglas A. Reynolds, Michael T. Padilla, Missing feature theory with soft spectral subtraction for speaker verification. conference of the international speech communication association. ,(2006)
John H.L. Hansen, Taufiq Hasan, Speaker Recognition by Machines and Humans: A tutorial review IEEE Signal Processing Magazine. ,vol. 32, pp. 74- 99 ,(2015) , 10.1109/MSP.2015.2462851
Jouni Pohjalainen, Cemal Hanilci, Tomi Kinnunen, Paavo Alku, Mixture Linear Prediction in Speaker Verification Under Vocal Effort Mismatch IEEE Signal Processing Letters. ,vol. 21, pp. 1516- 1520 ,(2014) , 10.1109/LSP.2014.2339632
Wei Rao, Man-Wai Mak, Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 21, pp. 1012- 1022 ,(2013) , 10.1109/TASL.2013.2243436
Prashant Krishnamurthy, Kaveh Pahlavan, Principles of Wireless Networks: A Unified Approach Principles of Wireless Networks: A Unified Approach 2nd. pp. 550- 550 ,(2011)
Vijendra Raj Apsingekar, Phillip L. De Leon, Speaker verification score normalization using speaker model clusters Speech Communication. ,vol. 53, pp. 110- 118 ,(2011) , 10.1016/J.SPECOM.2010.07.001
Xiaojia Zhao, DeLiang Wang, Analyzing noise robustness of MFCC and GFCC features in speaker identification international conference on acoustics, speech, and signal processing. pp. 7204- 7208 ,(2013) , 10.1109/ICASSP.2013.6639061
Brian R. Glasberg, Brian C. J. Moore, Auditory filter shapes in subjects with unilateral and bilateral cochlear impairments. Journal of the Acoustical Society of America. ,vol. 79, pp. 1020- 1033 ,(1986) , 10.1121/1.393374
Jouni Pohjalainen, Paavo Alku, GAUSSIAN MIXTURE LINEAR PREDICTION international conference on acoustics, speech, and signal processing. pp. 6285- 6289 ,(2014) , 10.1109/ICASSP.2014.6854813
Pawan K. Ajmera, Navnath S. Nehe, Dattatray V. Jadhav, Raghunath S. Holambe, Robust feature extraction from spectrum estimated using bispectrum for Isolated Word Recognition ieee india conference. ,vol. 15, pp. 433- 440 ,(2011) , 10.1007/S10772-012-9153-5