A novel method for two-speaker segmentation.

作者： Narayanaswamy Balakrishnan , Balakrishnan Narayanaswamy , Rashmi Gangadharaiah

DOI:

关键词:

摘要: This paper addresses the problem of speaker based audio data segmentation. A novel method that has advantages both model and metric techniques is proposed which creates a for each from available on fly. can be viewed as building Hidden Markov Model (HMM) with speakers abstracted hidden states. Each speaker/state modeled Gaussian Mixture (GMM). To prevent large number spurious change points being detected, use Generalized Likelihood Ratio (GLR) grouping feature vectors proposed. clustering technique described, through good initialization GMM achieved, such state corresponds to single not noise, silence or word classes, something may happen in conventional unlabelled clustering. Finally, refinement method, along lines Viterbi Training HMMs presented. The does require prior knowledge any characteristics. It also tuning threshold parameters, so it used confidence over new sets. assumes known apriori two. results decrease error rate by 84.75% files reported baseline system. performs just well even when segments are short 1s each, improvement some previous methods, larger accurate detection points.

uni-trier.de 本地加速

isca-speech.org 本地加速

uni-trier.de PDF 下载加速

参考文章(8)

Ramesh A. Gopinath, Alain Tritschler, Improved speaker segmentation and segments clustering using the bayesian information criterion. conference of the international speech communication association. ,(1999)

Shrikanth S. Narayanan, Soonil Kwon, Speaker change detection using a new weighted distance measure. conference of the international speech communication association. ,(2002)

S. Chen, Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998. ,(1998)

Perrine Delacourt, Speaker-based segmentation for audio data indexing ISCA. ,(1999)

Lie Lu, Hong-Jiang Zhang, Hao Jiang, Content analysis for audio classification and segmentation IEEE Transactions on Speech and Audio Processing. ,vol. 10, pp. 504- 516 ,(2002) , 10.1109/TSA.2002.804546

P. Delacourt, C. Wellekens, Audio data indexing: Use of second-order statistics for speaker-based segmentation Proceedings IEEE International Conference on Multimedia Computing and Systems. ,vol. 2, pp. 959- 963 ,(1999) , 10.1109/MMCS.1999.778619

H. Gish, M.-H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification international conference on acoustics, speech, and signal processing. pp. 873- 876 ,(1991) , 10.1109/ICASSP.1991.150477

Robin Rohlicek, Herbert Gish, Man-Hung Siu, Segregation of Speakers for Speech Recognition and Speaker ,(1991)

A novel method for two-speaker segmentation.

来源期刊

我的账户

A novel method for two-speaker segmentation.

来源期刊

相似文章 10

我的账户