作者: Narayanaswamy Balakrishnan , Balakrishnan Narayanaswamy , Rashmi Gangadharaiah
DOI:
关键词:
摘要: This paper addresses the problem of speaker based audio data segmentation. A novel method that has advantages both model and metric techniques is proposed which creates a for each from available on fly. can be viewed as building Hidden Markov Model (HMM) with speakers abstracted hidden states. Each speaker/state modeled Gaussian Mixture (GMM). To prevent large number spurious change points being detected, use Generalized Likelihood Ratio (GLR) grouping feature vectors proposed. clustering technique described, through good initialization GMM achieved, such state corresponds to single not noise, silence or word classes, something may happen in conventional unlabelled clustering. Finally, refinement method, along lines Viterbi Training HMMs presented. The does require prior knowledge any characteristics. It also tuning threshold parameters, so it used confidence over new sets. assumes known apriori two. results decrease error rate by 84.75% files reported baseline system. performs just well even when segments are short 1s each, improvement some previous methods, larger accurate detection points.