Using a priori information for speaker diarization.

作者: Daniel Moraru , Laurent Besacier , Eric Castelli

DOI:

关键词:

摘要: This paper presents an attempt to use supplementary information for audio data diarization. The approach is based on the of a priori about speakers involved in dialogue. Those specific are number conversation, and training available one speaker or all conversation. experiments were mainly conducted 2003 Rich Transcription Diarization corpus both Dry Run Corpus Evaluation corpus. results show that knowing exact seems not be very useful information. On other hand, using models may improve diarization performance when enough train reliable models.

参考文章(11)
Ivan Magrin-Chagnolleau, Guillaume Gravier, Raphaël Blouet, Overview of the 2000-2001 ELISA consortium research activities ISCA, A Speaker Odyssey, The Speaker Recognition Workshop. pp. 67- 72 ,(2001)
Sylvain Meignier, Jean-François Bonastre, Stéphane Igounet, E-HMM approach for learning and adapting sound models for speaker indexing ISCA, A Speaker Odyssey, The Speaker Recognition Workshop. pp. 175- 180 ,(2001)
S. Chen, Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998. ,(1998)
P. Delacourt, C.J. Wellekens, DISTBIC: a speaker-based segmentation for audio data indexing Speech Communication. ,vol. 32, pp. 111- 126 ,(2000) , 10.1016/S0167-6393(00)00027-3
Guillaume Gravier, Jean-François Bonastre, Khalid Choukri, Sylvain Galliano, Kevin McTait, Edouard Geoffrois, The ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News language resources and evaluation. ,(2004)
Corinne Fredouille, Jean-François Bonastre, Teva Merlin, AMIRAL: A Block-Segmental Multirecognizer Architecture for Automatic Speaker Recognition Digital Signal Processing. ,vol. 10, pp. 172- 197 ,(2000) , 10.1006/DSPR.1999.0367
Andre G. Adam, Sachin S. Kajarekar, Hynek Hermansky, A new speaker change detection method for two-speaker segmentation IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 4, pp. 3908- 3911 ,(2002) , 10.1109/ICASSP.2002.5745511
S. Meignier, D. Moraru, C. Fredouille, L. Besacier, J.F. Bonastre, Benefits of prior acoustic segmentation for automatic speaker segmentation international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 397- 400 ,(2004) , 10.1109/ICASSP.2004.1326006
H. Gish, M.-H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification international conference on acoustics, speech, and signal processing. pp. 873- 876 ,(1991) , 10.1109/ICASSP.1991.150477
D. Moraru, S. Meignier, C. Fredouille, L. Besacier, J.F. Bonastre, The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 373- 376 ,(2004) , 10.1109/ICASSP.2004.1326000