作者: X. Anguera , C. Woofers , J. Hernando
DOI: 10.1109/ASRU.2005.1566478
关键词:
摘要: One of the sub-tasks Spring 2004 and 2005 NIST Meetings evaluations requires segmenting multi-party meetings into speaker-homogeneous regions using data from multiple distant microphones (the "MDM" sub-task). approach to this task is run a speaker segmentation system on each microphone channels separately, then merge results. This can be thought as many-to-one post-processing approach. In paper we propose an alternative in which use delay-and-sum beamforming techniques fuse signals single enhanced signal. pre-processing propose, time delay arrival (TDOA) between reference channel computed incrementally window that steps through microphones. No information about locations or setup required. Using TDOA information, are first aligned summed resulting "enhanced" signal clustered our standard diarization system. We test evaluation databases show technique performs very well