Modeling dynamic prosodic variation for speaker verification.

作者: Mitchel Weintraub , Elizabeth Shriberg , Larry P. Heck , M. Kemal Sönmez

DOI:

关键词:

摘要: Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker’s distribution f0 values, such statistics fail to capture local dynamics intonation that characterize an individual’s speaking style. In this work, we take first step toward capturing suprasegmental patterns for automatic verification. Specifically, model the movements by fitting piecewise linear track obtain stylized contour. Parameters are then as statistical features We report on 1998 NIST verification evaluation. Prosody modeling improves performance cepstrum-based Gaussian mixture system (as measured task-specific Bayes risk) 10%.

参考文章(3)
Mitch Weintraub, Larry Heck, Kemal Sonmez, Yochai Konig, NONLINEAR DISCRIMINANT FEATURE EXTRACTION FOR ROBUST TEXT-INDEPENDENT SPEAKER RECOGNITION Workshop on Speaker Recognition and its Commercial and Forensic Applications, RLA2C 1998. ,(1997)
Mitchel Weintraub, Elizabeth Shriberg, Larry P. Heck, M. Kemal Sönmez, A lognormal tied mixture model of pitch for prosody based speaker recognition. conference of the international speech communication association. ,(1997)
M.J. Carey, E.S. Parris, H. Lloyd-Thomas, S. Bennett, Robust prosodic features for speaker identification international conference on spoken language processing. ,vol. 3, pp. 1800- 1803 ,(1996) , 10.1109/ICSLP.1996.607979