Speaker diarization for multi-party meetings using acoustic fusion

作者: X. Anguera , C. Woofers , J. Hernando

DOI: 10.1109/ASRU.2005.1566478

关键词:

摘要: One of the sub-tasks Spring 2004 and 2005 NIST Meetings evaluations requires segmenting multi-party meetings into speaker-homogeneous regions using data from multiple distant microphones (the "MDM" sub-task). approach to this task is run a speaker segmentation system on each microphone channels separately, then merge results. This can be thought as many-to-one post-processing approach. In paper we propose an alternative in which use delay-and-sum beamforming techniques fuse signals single enhanced signal. pre-processing propose, time delay arrival (TDOA) between reference channel computed incrementally window that steps through microphones. No information about locations or setup required. Using TDOA information, are first aligned summed resulting "enhanced" signal clustered our standard diarization system. We test evaluation databases show technique performs very well

参考文章(9)
Xavier Anguera, Chuck Wooters, Barbara Peskin, James Fung, TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM ,(2004)
Martin Graciarena, Andreas Stolcke, Ivan Bulyko, Nikki Mirghafori, Chuck Wooters, Barbara Peskin, David Gelbart, Mari Ostendorf, Scott Otterson, Tuomo W. Pirinen, From switchboard to meetings: development of the 2004 ICSI-SRI-UW meeting recognition system. conference of the international speech communication association. ,(2004)
Hans-Günter Hirsch, HMM adaptation for applications in telecommunication Speech Communication. ,vol. 34, pp. 127- 139 ,(2000) , 10.1016/S0167-6393(00)00050-9
M.S. Brandstein, H.F. Silverman, A robust method for speech signal time-delay estimation in reverberant rooms international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 375- 378 ,(1997) , 10.1109/ICASSP.1997.599651
J. L. Flanagan, J. D. Johnston, R. Zahn, G. W. Elko, Computer-steered microphone arrays for sound transduction in large rooms Journal of the Acoustical Society of America. ,vol. 78, pp. 1508- 1518 ,(1985) , 10.1121/1.2022858
J. Ajmera, C. Wooters, A robust speaker clustering algorithm ieee automatic speech recognition and understanding workshop. pp. 411- 416 ,(2003) , 10.1109/ASRU.2003.1318476
Xavier Anguera, Chuck Wooters, Barbara Peskin, Mateu Aguiló, Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System Machine Learning for Multimodal Interaction. pp. 402- 414 ,(2006) , 10.1007/11677482_34
Qin Jin, Tanja Schultz, Speaker Segmentation and Clustering in Meetings conference of the international speech communication association. ,(2004)
Corinne Fredouille, Daniel Moraru, Sylvain Meignier, Laurent Besacier, Jean-François Bonastre, The NIST 2004 spring rich transcription evaluation : two-axis merging strategy in the context of multiple distance microphone based meeting speaker segmentation RT2004 Spring Meeting Recognition Workshop. ,(2004)