Clustering of triphones using phoneme similarity estimation for the definition of a multilingual set of triphones

作者: Bojan Imperl , Zdravko Kačič , Bogomir Horvat , Andrej Žgank

DOI: 10.1016/S0167-6393(02)00048-1

关键词:

摘要: This paper addresses the problem of multilingual acoustic modelling for design speech recognisers. An agglomerative clustering algorithm definition set triphones is proposed. based on an indirect distance measure defined as a weighted sum explicit estimates context similarity monophone level. The estimation method Houtgast. new was tested in recognition experiment three languages. applied monolingual triphone sets language specific recognisers all In order to evaluate algorithm, performance compared reference system composed operating parallel, and produced by tree-based algorithm. All experiments were 1000 FDB SpeechDat(II) databases (Slovenian, Spanish German). Experiments have shown that use results significant reduction number with minor degradation rate.

参考文章(21)
Alex Waibel, Tanja Schultz, Multilingual and Crosslingual Speech Recognition Proceedings of the DARPA Broadcast News Transcription and Understanding. pp. 259- 262 ,(1998) , 10.5445/IR/44598
Shubha Kadambe, James Hieronymus, Spontaneous speech language identification with a knowledge of linguistics. conference of the international speech communication association. ,(1994)
Filippo Gallocchio, Giorgio Micca, Patrizia Bonaventura, Multilingual speech recognition for flexible vocabularies. conference of the international speech communication association. ,(1997)
Alex Waibel, Tanja Schultz, Language adaptive LVCSR through Polyphone Decision Tree Specialization Workshop on Multi-lingual Interoperability in Speech Technology. pp. 85- 90 ,(2000)
Andreas Stolcke, Fuliang Weng, Harry Bratt, Leonardo Neumeyer, A study of multilingual speech recognition. conference of the international speech communication association. ,(1997)
Andrej Zgank, Finn Tore Johansen, Narada D. Warakagoda, Gunnar Lehtinen, Kjell Elenius, Giampiero Salvi, Børge Lindberg, Zdravko Kacic, The cost 249 speechdat multilingual reference recogniser language resources and evaluation. ,(2000)
F. Metze, T. Kemp, T. Schaaf, T. Schultz, H. Soltau, Confidence measure based language identification international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1827- 1830 ,(2000) , 10.1109/ICASSP.2000.862110
Henk van den Heuvel, Louis Boves, Asuncion Moreno, Maurizio Omologo, Gaël Richard, Eric Sanders, Annotation in the SpeechDat Projects International Journal of Speech Technology. ,vol. 4, pp. 127- 143 ,(2001) , 10.1023/A:1011375311203
Etienne Barnard, Kay Margarethe Berkling, Automatic language identification with sequences of language-independent phoneme clusters Oregon Graduate Institute of Science and Technology. ,(1996)
Hervé Bourlard, Hynek Hermansky, Nelson Morgan, Towards increasing speech recognition error rates Speech Communication. ,vol. 18, pp. 205- 231 ,(1996) , 10.1016/0167-6393(96)00003-9