作者: Mitchel Weintraub , Elizabeth Shriberg , Larry P. Heck , M. Kemal Sönmez
DOI:
关键词:
摘要: Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker’s distribution f0 values, such statistics fail to capture local dynamics intonation that characterize an individual’s speaking style. In this work, we take first step toward capturing suprasegmental patterns for automatic verification. Specifically, model the movements by fitting piecewise linear track obtain stylized contour. Parameters are then as statistical features We report on 1998 NIST verification evaluation. Prosody modeling improves performance cepstrum-based Gaussian mixture system (as measured task-specific Bayes risk) 10%.