作者: R.D. Zilca , B. Kingsbury , J. Navratil , G.N. Ramaswamy
关键词:
摘要: The fine spectral structure related to pitch information is conveyed in Mel cepstral features, with variations causing the features. For speaker recognition systems, this phenomenon, known as “pitch mismatch” between training and testing, can increase error rates. Likewise, pitch-related variability may potentially rates speech systems for languages such English which does not carry phonetic information. In addition, both parsing of raw signal into frames traditionally performed using a constant frame size offset, without aligning natural cycles. As result power estimation that done part computation include artifacts. Pitch synchronous methods have addressed problem past, at expense adding some complexity by variable and/or offset. This paper introduces Pseudo Synchronous (PPS) processing procedures attempt align each individual its cycle avoid truncation cycles while still an effort address above problems. Text independent experiments on NIST tasks demonstrate performance improvement when scores produced PPS are fused traditional scores. better distribution errors across trials be obtained similar rates, insight regarding role fundamental frequency revealed. Speech run Aurora-2 noisy digits task also show improved robustness accuracy extremely low signal-to-noise ratio (SNR) data.