A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

作者: DEBADATTA PATI , S R MAHADEVA PRASANNA

DOI: 10.1007/S12046-013-0163-Z

关键词:

摘要: In this paper, the explicit and implicit modelling of subsegmental excitation information are experimentally compared. For modelling, static dynamic values standard Liljencrants–Fant (LF) parameters that model glottal flow derivative (GFD) used. A simplified approximation method is proposed to compute these LF by locating closing opening instants. The approach significantly reduces computation needed implement model. linear prediction (LP) residual samples considered in blocks 5 ms with shift 2.5 Different speaker recognition studies performed using NIST-99 NIST-03 databases. case identification, provides better performance compared modelling. Alternatively, seem be providing verification. This indicates have relatively less intra inter-speaker variability. on other hand, has more What desirable Therefore, for verification task may used identification Further, both tasks complimentary state-of-the-art vocal tract features. contribution features robust against noise. We suggest can recognition.

参考文章(49)
P. C. Ching, Tan Lee, Ning Wang, Exploration of vocal excitation modulation features for speaker recognition. conference of the international speech communication association. pp. 892- 895 ,(2009)
John G. Proakis, John R. Deller, John H. Hansen, Discrete-Time Processing of Speech Signals ,(1993)
Shoji Hayakawa, Kazuya Takeda, Fumitada Itakura, Speaker Identification Using Harmonic Structure of LP-residual Spectrum AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication. pp. 253- 260 ,(1997) , 10.1007/BFB0016002
K. Sri Rama Murty, S.R. Mahadeva Prasanna, B. Yegnanarayana, Speaker-specific information from residual phase international conference on signal processing. pp. 516- 519 ,(2004) , 10.1109/SPCOM.2004.1458513
H. Ezzaidi, J. Rouat, Pitch and MFCC dependent GMM models for speaker identification systems canadian conference on electrical and computer engineering. ,vol. 1, pp. 43- 46 ,(2004) , 10.1109/CCECE.2004.1344954
Hema A. Murthy, R. Padmanabhan, Acoustic feature diversity and speaker verification. conference of the international speech communication association. pp. 2110- 2113 ,(2010)
Mark Ordowski, Mark A. Przybocki, Alvin F. Martin, George R. Doddington, Terri Kamm, The DET Curve in Assessment of Detection Task Performance conference of the international speech communication association. ,(1997)
J. Makhoul, Linear prediction: A tutorial review Proceedings of the IEEE. ,vol. 63, pp. 561- 580 ,(1975) , 10.1109/PROC.1975.9792
Nengheng Zheng, Tan Lee, P. C. Ching, Integration of Complementary Acoustic Features for Speaker Recognition IEEE Signal Processing Letters. ,vol. 14, pp. 181- 184 ,(2007) , 10.1109/LSP.2006.884031
Sandra Pruzansky, Max V. Mathews, P. B. Britner, Talker‐Recognition Procedure Based on Analysis of Variance The Journal of the Acoustical Society of America. ,vol. 35, pp. 1877- 1877 ,(1963) , 10.1121/1.2142608