Speaker segmentation based on between-window correlation over speakers' characteristics

作者: Thomas Fang Zheng , Gang Wang

DOI:

关键词:

摘要: Speaker segmentation is widely applied in many domains such as multi-speaker detection and speaker tracking. However, the performance of conventional metric-based methods neither good enough nor stable due to stability between-window distance calculation. In order enhance hence improve performance, a new method based on correlation over speakers' characteristics proposed. this method, set reference models are trained which can represent whole model space. The likelihood vectors scores against these taken metric. gender information Peak Valley also used. Experiments NIST SRE 2002 Segmentation BNEWS SWBD Datasets show that better be achieved compared with BIC GLR methods. What's more, proposed achieve approximately best wider value range predefined thresholds than methods, reduces threshold sensitivity.

参考文章(15)
Sylvain Meignier, Jean-François Bonastre, Stéphane Igounet, E-HMM approach for learning and adapting sound models for speaker indexing ISCA, A Speaker Odyssey, The Speaker Recognition Workshop. pp. 175- 180 ,(2001)
S. Chen, Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998. ,(1998)
Matthew A Siegler, Uday Jain, Bhiksha Raj, Richard M Stern, Automatic Segmentation, Classification and Clustering of Broadcast News Audio DARPA Speech Recognition Workshop, 1997. pp. 97- 99 ,(1997)
M. Collet, D. Charlet, F. Bimbot, A correlation metric for speaker tracking using anchor models international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 713- 716 ,(2005) , 10.1109/ICASSP.2005.1415213
Douglas A. Reynolds, Thomas F. Quatieri, Robert B. Dunn, Speaker Verification Using Adapted Gaussian Mixture Models Digital Signal Processing. ,vol. 10, pp. 19- 41 ,(2000) , 10.1006/DSPR.1999.0361
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Belkacem Fergani, Manuel Davy, Amrane Houacine, Speaker diarization using one-class support vector machines Speech Communication. ,vol. 50, pp. 355- 365 ,(2008) , 10.1016/J.SPECOM.2007.11.006
A. V. HALL, Methods for demonstrating Resemblance in Taxonomy and Ecology Nature. ,vol. 214, pp. 830- 831 ,(1967) , 10.1038/214830A0
S. Furui, Cepstral analysis technique for automatic speaker verification IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 29, pp. 254- 272 ,(1981) , 10.1109/TASSP.1981.1163530
Jing Deng, Thomas Fang Zheng, Wenhu Wu, Session Variability Subspace Projection Based Model Compensation for Speaker Verification international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 57- 60 ,(2007) , 10.1109/ICASSP.2007.367162