Prosodic Features for Speaker Recognition

作者: Leena Mary

DOI: 10.1007/978-1-4614-0263-3_13

关键词:

摘要: In this chapter the effectiveness of syllable-based prosodic features for speaker recognition is discussed. The term prosody represents a collection characteristics such as intonation, stress and timing, primarily expressed using variations in pitch, energy duration at various levels speech. Prosody reflects learned/acquired speaking habits person hence contributes recognition. Because are less affected by channel mismatch noise, they particularly well suited forensics, field that demands accurate identification suspects with few mitigating conditions possible. chapter, author describes method extracting directly from speech signal. Applying method, segmented into syllable-like regions vowel onset points (VOP). locations VOPs serve reference extraction representation features. demonstrated extended task NIST evaluation 2003. Combining evidence spectral proposed helps to improve overall accuracy.

参考文章(36)
Andreas Stolcke, Elizabeth Shriberg, The case for automatic higher-level features in forensic speaker recognition. conference of the international speech communication association. pp. 1509- 1512 ,(2008)
Wolfgang Hess, Pitch Determination of Speech Signals Springer Berlin Heidelberg. ,(1983) , 10.1007/978-3-642-81926-1
B. Yegnanarayana, S. R. Mahadeva Prasanna, Detection of Vowel Onset Point Events using Excitation Information conference of the international speech communication association. pp. 1133- 1136 ,(2005)
Leena Mary, B. Yegnanarayana, Prosodic features for speaker verification. conference of the international speech communication association. ,(2006)
Mitchel Weintraub, Elizabeth Shriberg, Larry P. Heck, M. Kemal Sönmez, A lognormal tied mixture model of pitch for prosody based speaker recognition. conference of the international speech communication association. ,(1997)
A.E. Thyme-Gobbel, S.E. Hutchins, On using prosodic cues in automatic language identification international conference on spoken language processing. ,vol. 3, pp. 1768- 1771 ,(1996) , 10.1109/ICSLP.1996.607971
J. Makhoul, Linear prediction: A tutorial review Proceedings of the IEEE. ,vol. 63, pp. 561- 580 ,(1975) , 10.1109/PROC.1975.9792
B. Yegnanarayana, S.P. Kishore, AANN: an alternative to GMM for pattern recognition Neural Networks. ,vol. 15, pp. 459- 469 ,(2002) , 10.1016/S0893-6080(02)00019-9
E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, A. Stolcke, Modeling prosodic feature sequences for speaker recognition Speech Communication. ,vol. 46, pp. 455- 472 ,(2005) , 10.1016/J.SPECOM.2005.02.018
B. S. Atal, Automatic Speaker Recognition Based on Pitch Contours The Journal of the Acoustical Society of America. ,vol. 52, pp. 1687- 1697 ,(1972) , 10.1121/1.1913303