Frame pruning for speaker recognition

作者: L. Besacier , J.F. Bonastre

DOI: 10.1109/ICASSP.1998.675377

关键词:

摘要: In this paper, we propose a frame selection procedure for text-independent speaker identification. Instead of averaging the likelihoods along whole test utterance, some these are rejected (pruning) and final score is computed with limited number frames. This pruning stage requires prior level likelihood normalization in order to make comparison between frames meaningful. alone leads significant performance enhancement. As far as concerned, optimal pruned learned on tuning data set normal telephone speech. Validation 567 speakers 27% identification rate improvement TIMIT, 17% NTIMIT.

参考文章(5)
Douglas D. O'Shaughnessy, Rivarol Vergin, A double Gaussian mixture modeling approach to speaker recognition. conference of the international speech communication association. ,(1997)
Laurent Besacier, Jean-François Bonastre, Subband Approach for Automatic Speaker Recognition: Optimal Division of the Frequency Domain AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication. pp. 195- 202 ,(1997) , 10.1007/BFB0015996
Frédéric Bimbot, Ivan Magrin-Chagnolleau, Luc Mathan, Second-order statistical measures for text-independent speaker identification Speech Communication. ,vol. 17, pp. 177- 192 ,(1995) , 10.1016/0167-6393(95)00013-E
H. Gish, M. Schmidt, Text-independent speaker identification IEEE Signal Processing Magazine. ,vol. 11, pp. 18- 32 ,(1994) , 10.1109/79.317924
C. Jankowski, A. Kalyanswamy, S. Basson, J. Spitz, NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database international conference on acoustics, speech, and signal processing. pp. 109- 112 ,(1990) , 10.1109/ICASSP.1990.115550