A novel method for two-speaker segmentation.

作者: Narayanaswamy Balakrishnan , Balakrishnan Narayanaswamy , Rashmi Gangadharaiah

DOI:

关键词:

摘要: This paper addresses the problem of speaker based audio data segmentation. A novel method that has advantages both model and metric techniques is proposed which creates a for each from available on fly. can be viewed as building Hidden Markov Model (HMM) with speakers abstracted hidden states. Each speaker/state modeled Gaussian Mixture (GMM). To prevent large number spurious change points being detected, use Generalized Likelihood Ratio (GLR) grouping feature vectors proposed. clustering technique described, through good initialization GMM achieved, such state corresponds to single not noise, silence or word classes, something may happen in conventional unlabelled clustering. Finally, refinement method, along lines Viterbi Training HMMs presented. The does require prior knowledge any characteristics. It also tuning threshold parameters, so it used confidence over new sets. assumes known apriori two. results decrease error rate by 84.75% files reported baseline system. performs just well even when segments are short 1s each, improvement some previous methods, larger accurate detection points.

参考文章(8)
Ramesh A. Gopinath, Alain Tritschler, Improved speaker segmentation and segments clustering using the bayesian information criterion. conference of the international speech communication association. ,(1999)
Shrikanth S. Narayanan, Soonil Kwon, Speaker change detection using a new weighted distance measure. conference of the international speech communication association. ,(2002)
S. Chen, Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998. ,(1998)
Lie Lu, Hong-Jiang Zhang, Hao Jiang, Content analysis for audio classification and segmentation IEEE Transactions on Speech and Audio Processing. ,vol. 10, pp. 504- 516 ,(2002) , 10.1109/TSA.2002.804546
P. Delacourt, C. Wellekens, Audio data indexing: Use of second-order statistics for speaker-based segmentation Proceedings IEEE International Conference on Multimedia Computing and Systems. ,vol. 2, pp. 959- 963 ,(1999) , 10.1109/MMCS.1999.778619
H. Gish, M.-H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification international conference on acoustics, speech, and signal processing. pp. 873- 876 ,(1991) , 10.1109/ICASSP.1991.150477
Robin Rohlicek, Herbert Gish, Man-Hung Siu, Segregation of Speakers for Speech Recognition and Speaker ,(1991)