Fast speaker change detection for broadcast news transcription and indexing.

作者: Francis Kubala , Daben Liu

DOI:

关键词: Transcription (software)PhoneSpeech recognitionSearch engine indexingChange detectionComputer scienceMultimedia

摘要: In this paper, we describe a new speaker change detection algorithm designed for fast transcription and audio indexing of spoken broadcast news. We have two-stage that begins with gender-independent phone-class recognition pass. collapse the phoneme inventory to only 4 broad classes include different models non-speech, resulting in small decoder runs less than 0.1 times real-time. The second stage SCD hypothesizes boundary between every phone labeled input. level time resolution our approach permits run quickly while maintaining same accuracy as frame approach. Applying algorithms large sample news programs resulted improvements accuracy, speech speed.

参考文章(4)
S. Chen, Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998. ,(1998)
Matthew A Siegler, Uday Jain, Bhiksha Raj, Richard M Stern, Automatic Segmentation, Classification and Clustering of Broadcast News Audio DARPA Speech Recognition Workshop, 1997. pp. 97- 99 ,(1997)
A Tuerk, PC Woodland, SJ Young, T Hain, SE Johnson, Segment generation and clustering in the HTK broadcast news transcription system DARPA. ,(1998)
H. Gish, M.-H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification international conference on acoustics, speech, and signal processing. pp. 873- 876 ,(1991) , 10.1109/ICASSP.1991.150477