Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features

作者： Deepu Vijayasenan , Fabio Valente , Hervé Bourlard

DOI: 10.1016/J.SPECOM.2011.07.001

关键词:

摘要: Many state-of-the-art diarization systems for meeting recordings are based on the HMM/GMM framework and combination of spectral (MFCC) time delay arrivals (TDOA) features. This paper presents an extensive study how multistream can be improved beyond these two sets While several other features have been proven effective speaker diarization, little efforts devoted to integrate them into MFCC+TDOA baseline authors' best knowledge, no positive results reported so far. The first contribution this consists in analyzing reasons this, investigating through a set oracle experiments robustness when also (the modulation spectrum frequency domain linear prediction features) integrated. second introducing non-parametric method information bottleneck (IB) approach. In contrary which makes use log-likelihood combination, it combines feature streams normalized space relevance variables. previous analysis is repeated revealing that proposed approach more robust actually benefit from sources conventional MFCC TDOA Experiments rich transcription data (heterogeneous meetings recorded different rooms) show achieves very competitive error only 6.3% four used, compared 14.9% system. Those analyzed terms sensitivity stream weightings. To knowledge successful attempt reduce combining with shortcomings going baseline. As last contribution, addresses issues related computational complexity approaches.

参考文章(29)

S. Chen, Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998. ,(1998)

David A. van Leeuwen, Matej Konečný, Progress in the AMIDA Speaker Diarization System for Meeting Data Multimodal Technologies for Perception of Humans. pp. 475- 483 ,(2008) , 10.1007/978-3-540-68585-2_44

Gerald Friedland, Oriol Vinyals, Modulation spectrogram features for improved speaker diarization. conference of the international speech communication association. pp. 630- 633 ,(2008)

Xavier Anguera Miró, ROBUST SPEAKER DIARIZATION FOR MEETINGS TDX (Tesis Doctorals en Xarxa). ,(2006)

Jitendra Ajmera, Iain A McCowan, Hervé Bourlard, Robust audio segmentation École Polytechnique Fédérale de Lausanne. ,(2004) , 10.5075/EPFL-THESIS-3022

Guillermo Aradilla, Acoustic Models for Posterior Features in Speech Recognition Ecole Polytechnique Fédérale de Lausanne. ,(2008) , 10.5075/EPFL-THESIS-4164

Hynek Hermansky, Sriram Ganapathy, Samuel Thomas, Front-end for Far-field Speech Recognition based on Frequency Domain Linear Prediction conference of the international speech communication association. pp. 984- 987 ,(2008)

Hervé Bourlard, Fabio Valente, Deepu Vijayasenan, KL Realignment for Speaker Diarization with Multiple Feature Streams conference of the international speech communication association. pp. 1059- 1062 ,(2009)

Po-Chuan Lin, Jia-Ching Wang, Jhing-Fa Wang, Hao-Ching Sung, Unsupervised speaker change detection using SVM training misclassification rate IEEE Transactions on Computers. ,vol. 56, pp. 1212- 1244 ,(2007) , 10.1109/TC.2007.70746

10.

Athanasios Noulas, Ben J. A. Krose, On-line multi-modal speaker diarization Proceedings of the ninth international conference on Multimodal interfaces - ICMI '07. pp. 350- 357 ,(2007) , 10.1145/1322192.1322254

Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features

来源期刊

我的账户

Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features

来源期刊

相似文章 10

我的账户