作者: Narayanaswamy Balakrishnan , Richard M. Stern , Rashmi Gangadharaiah
DOI:
关键词: Speaker diarisation 、 Change detection 、 Speech recognition 、 Computer science 、 Pattern recognition 、 Voting 、 Scale-space segmentation 、 Speaker recognition 、 Artificial intelligence 、 Cluster analysis 、 Segmentation
摘要: The process of locating the end points each speakers voice in an audio file and then clustering segments based speaker identity is called segmentation. In this paper we present a method for two segmentation, though it can be extended to more than speakers. Most methods segmentation start with initial computationally inexpensive method, followed by accurate segment clustering. describe simple algorithm that improves accuracy while not increasing computational complexity. Since done iteratively, improvement step results significant overall increase cluster purity. We borrow ideas from recognition perform frame voting. look at as independent classifier deciding which generated segment. These ’classifiers’ are combined voting make decision should clustered together. This change leads 56.9% decrease error rates on task SWITCHBOARD corpus. Index Terms: Speaker Voting combination, detection,