作者: Deepu Vijayasenan , Fabio Valente , Hervé Bourlard
DOI: 10.1016/J.SPECOM.2011.07.001
关键词:
摘要: Many state-of-the-art diarization systems for meeting recordings are based on the HMM/GMM framework and combination of spectral (MFCC) time delay arrivals (TDOA) features. This paper presents an extensive study how multistream can be improved beyond these two sets While several other features have been proven effective speaker diarization, little efforts devoted to integrate them into MFCC+TDOA baseline authors' best knowledge, no positive results reported so far. The first contribution this consists in analyzing reasons this, investigating through a set oracle experiments robustness when also (the modulation spectrum frequency domain linear prediction features) integrated. second introducing non-parametric method information bottleneck (IB) approach. In contrary which makes use log-likelihood combination, it combines feature streams normalized space relevance variables. previous analysis is repeated revealing that proposed approach more robust actually benefit from sources conventional MFCC TDOA Experiments rich transcription data (heterogeneous meetings recorded different rooms) show achieves very competitive error only 6.3% four used, compared 14.9% system. Those analyzed terms sensitivity stream weightings. To knowledge successful attempt reduce combining with shortcomings going baseline. As last contribution, addresses issues related computational complexity approaches.