Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition

作者: R.D. Zilca , B. Kingsbury , J. Navratil , G.N. Ramaswamy

DOI: 10.1109/TSA.2005.857809

关键词:

摘要: The fine spectral structure related to pitch information is conveyed in Mel cepstral features, with variations causing the features. For speaker recognition systems, this phenomenon, known as “pitch mismatch” between training and testing, can increase error rates. Likewise, pitch-related variability may potentially rates speech systems for languages such English which does not carry phonetic information. In addition, both parsing of raw signal into frames traditionally performed using a constant frame size offset, without aligning natural cycles. As result power estimation that done part computation include artifacts. Pitch synchronous methods have addressed problem past, at expense adding some complexity by variable and/or offset. This paper introduces Pseudo Synchronous (PPS) processing procedures attempt align each individual its cycle avoid truncation cycles while still an effort address above problems. Text independent experiments on NIST tasks demonstrate performance improvement when scores produced PPS are fused traditional scores. better distribution errors across trials be obtained similar rates, insight regarding role fundamental frequency revealed. Speech run Aurora-2 noisy digits task also show improved robustness accuracy extremely low signal-to-noise ratio (SNR) data.

参考文章(22)
Juan M. Huerta, George Saon, Improvements to the IBM Aurora 2 multi-condition system. conference of the international speech communication association. ,(2002)
Wolfgang Hess, Pitch Determination of Speech Signals Springer Berlin Heidelberg. ,(1983) , 10.1007/978-3-642-81926-1
Ganesh N. Ramaswamy, Jirí Navrátil, DETAC: a discriminative criterion for speaker verification. conference of the international speech communication association. ,(2002)
Ganesh N. Ramaswamy, Jir ´ i Navratil, THE AWE AND MYSTERY OF T-NORM conference of the international speech communication association. ,(2003)
Ran D. Zilca, Ganesh N. Ramaswamy, Jirí Navrátil, "syncpitch": a pseudo pitch synchronous algorithm for speaker recognition. conference of the international speech communication association. ,(2003)
Thomas F. Quatieri, Douglas A. Reynolds, Robert B. Dunn, On the influence of rate, pitch, and spectrum on automatic speaker recognition performance. conference of the international speech communication association. pp. 491- 494 ,(2000)
Mark A. Przybocki, Douglas A. Reynolds, Alvin F. Martin, George R. Doddington, Walter Liggett, Sheep, Goats, Lambs and Wolves: A Statistical Analysis of Speaker Performance in the NIST 1998 Speaker Recognition Evaluation conference of the international speech communication association. ,(1998)
Mark Ordowski, Mark A. Przybocki, Alvin F. Martin, George R. Doddington, Terri Kamm, The DET Curve in Assessment of Detection Task Performance conference of the international speech communication association. ,(1997)
G. Saon, M. Padmanabhan, R. Gopinath, Eliminating inter-speaker variability prior to discriminant transforms ieee automatic speech recognition and understanding workshop. pp. 73- 76 ,(2001) , 10.1109/ASRU.2001.1034592
M.J. Carey, E.S. Parris, H. Lloyd-Thomas, S. Bennett, Robust prosodic features for speaker identification international conference on spoken language processing. ,vol. 3, pp. 1800- 1803 ,(1996) , 10.1109/ICSLP.1996.607979