Depitch and the role of fundamental frequency in speaker recognition

作者: R.D. Zilea , J. Navratil , G.N. Ramaswamy

DOI: 10.1109/ICASSP.2003.1202299

关键词:

摘要: Pitch information is known to be partially conveyed in Mel cepstral features that are commonly used for speaker recognition. In particular, high pitched female speakers, and whenever average pitch varies significantly between enrollment testing, the fine spectral structure introduced by fundamental frequency was shown degrade recognition performance. This paper introduces a signal processing procedure termed depitch attempts remove from speech signal. Recognition experiments carried out on subset of NIST 2002 Speaker Evaluation show combining scores conventional depitched system, substantial improvement equal error rate obtained speakers pitch-mismatched trials. Performing pitch/depitch score fusion also help alleviate well-known problem "goat" speakers.

参考文章(7)
Wolfgang Hess, Pitch Determination of Speech Signals Springer Berlin Heidelberg. ,(1983) , 10.1007/978-3-642-81926-1
Ganesh N. Ramaswamy, Jirí Navrátil, DETAC: a discriminative criterion for speaker verification. conference of the international speech communication association. ,(2002)
M.J. Carey, E.S. Parris, H. Lloyd-Thomas, S. Bennett, Robust prosodic features for speaker identification international conference on spoken language processing. ,vol. 3, pp. 1800- 1803 ,(1996) , 10.1109/ICSLP.1996.607979
Alvin Martin, Mark Przybocki, The NIST 1999 Speaker Recognition Evaluation An Overview Digital Signal Processing. ,vol. 10, pp. 1- 18 ,(2000) , 10.1006/DSPR.1999.0355
T.F. Quatieri, R.B. Dunn, D.A. Reynolds, J.P. Campbell, E. Singer, Speaker recognition using G.729 speech codec parameters international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 1089- 1092 ,(2000) , 10.1109/ICASSP.2000.859153
Douglas A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models Speech Communication. ,vol. 17, pp. 91- 108 ,(1995) , 10.1016/0167-6393(95)00009-D
George R. Doddington, Mark A. Przybocki, Alvin F. Martin, Douglas A. Reynolds, The NIST speaker recognition evaluation - overview methodology, systems, results, perspective Speech Communication. ,vol. 31, pp. 225- 254 ,(2000) , 10.1016/S0167-6393(99)00080-1