Voting for two speaker segmentation

作者: Narayanaswamy Balakrishnan , Richard M. Stern , Rashmi Gangadharaiah

DOI:

关键词: Speaker diarisationChange detectionSpeech recognitionComputer sciencePattern recognitionVotingScale-space segmentationSpeaker recognitionArtificial intelligenceCluster analysisSegmentation

摘要: The process of locating the end points each speakers voice in an audio file and then clustering segments based speaker identity is called segmentation. In this paper we present a method for two segmentation, though it can be extended to more than speakers. Most methods segmentation start with initial computationally inexpensive method, followed by accurate segment clustering. describe simple algorithm that improves accuracy while not increasing computational complexity. Since done iteratively, improvement step results significant overall increase cluster purity. We borrow ideas from recognition perform frame voting. look at as independent classifier deciding which generated segment. These ’classifiers’ are combined voting make decision should clustered together. This change leads 56.9% decrease error rates on task SWITCHBOARD corpus. Index Terms: Speaker Voting combination, detection,

参考文章(10)
Narayanaswamy Balakrishnan, Balakrishnan Narayanaswamy, Rashmi Gangadharaiah, A novel method for two-speaker segmentation. conference of the international speech communication association. ,(2004)
Shrikanth S. Narayanan, Soonil Kwon, Speaker change detection using a new weighted distance measure. conference of the international speech communication association. ,(2002)
S. Chen, Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998. ,(1998)
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Andre G. Adam, Sachin S. Kajarekar, Hynek Hermansky, A new speaker change detection method for two-speaker segmentation IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 4, pp. 3908- 3911 ,(2002) , 10.1109/ICASSP.2002.5745511
J.P. Campbell, D.A. Reynolds, Corpora for the evaluation of speaker recognition systems international conference on acoustics speech and signal processing. ,vol. 2, pp. 829- 832 ,(1999) , 10.1109/ICASSP.1999.759799
B. Narayanaswamy, R. Gangadharaiah, Extracting additional information from Gaussian mixture model probabilities for improved text independent speaker identification international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 621- 624 ,(2005) , 10.1109/ICASSP.2005.1415190
S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 28, pp. 65- 74 ,(1980) , 10.1109/TASSP.1980.1163420
H. Gish, M.-H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification international conference on acoustics, speech, and signal processing. pp. 873- 876 ,(1991) , 10.1109/ICASSP.1991.150477