Speaker diarization for multi-party meetings using acoustic fusion

作者： X. Anguera , C. Woofers , J. Hernando

DOI: 10.1109/ASRU.2005.1566478

关键词:

摘要: One of the sub-tasks Spring 2004 and 2005 NIST Meetings evaluations requires segmenting multi-party meetings into speaker-homogeneous regions using data from multiple distant microphones (the "MDM" sub-task). approach to this task is run a speaker segmentation system on each microphone channels separately, then merge results. This can be thought as many-to-one post-processing approach. In paper we propose an alternative in which use delay-and-sum beamforming techniques fuse signals single enhanced signal. pre-processing propose, time delay arrival (TDOA) between reference channel computed incrementally window that steps through microphones. No information about locations or setup required. Using TDOA information, are first aligned summed resulting "enhanced" signal clustered our standard diarization system. We test evaluation databases show technique performs very well

ieee.org 本地加速

upc.edu 本地加速

upc.edu PDF 下载加速

berkeley.edu PDF 下载加速

参考文章(9)

Xavier Anguera, Chuck Wooters, Barbara Peskin, James Fung, TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM ,(2004)

Martin Graciarena, Andreas Stolcke, Ivan Bulyko, Nikki Mirghafori, Chuck Wooters, Barbara Peskin, David Gelbart, Mari Ostendorf, Scott Otterson, Tuomo W. Pirinen, From switchboard to meetings: development of the 2004 ICSI-SRI-UW meeting recognition system. conference of the international speech communication association. ,(2004)

Hans-Günter Hirsch, HMM adaptation for applications in telecommunication Speech Communication. ,vol. 34, pp. 127- 139 ,(2000) , 10.1016/S0167-6393(00)00050-9

M.S. Brandstein, H.F. Silverman, A robust method for speech signal time-delay estimation in reverberant rooms international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 375- 378 ,(1997) , 10.1109/ICASSP.1997.599651

J. L. Flanagan, J. D. Johnston, R. Zahn, G. W. Elko, Computer-steered microphone arrays for sound transduction in large rooms Journal of the Acoustical Society of America. ,vol. 78, pp. 1508- 1518 ,(1985) , 10.1121/1.2022858

J. Ajmera, C. Wooters, A robust speaker clustering algorithm ieee automatic speech recognition and understanding workshop. pp. 411- 416 ,(2003) , 10.1109/ASRU.2003.1318476

Xavier Anguera, Chuck Wooters, Barbara Peskin, Mateu Aguiló, Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System Machine Learning for Multimodal Interaction. pp. 402- 414 ,(2006) , 10.1007/11677482_34

Qin Jin, Tanja Schultz, Speaker Segmentation and Clustering in Meetings conference of the international speech communication association. ,(2004)

Corinne Fredouille, Daniel Moraru, Sylvain Meignier, Laurent Besacier, Jean-François Bonastre, The NIST 2004 spring rich transcription evaluation : two-axis merging strategy in the context of multiple distance microphone based meeting speaker segmentation RT2004 Spring Meeting Recognition Workshop. ,(2004)

Speaker diarization for multi-party meetings using acoustic fusion

来源期刊

我的账户

Speaker diarization for multi-party meetings using acoustic fusion

来源期刊

相似文章 10

我的账户