Derivation of the optimal set of phonetic transcriptions for a word from its acoustic realizations

作者: Houda Mokbel , D. Jouvet

DOI: 10.1016/S0167-6393(99)00021-7

关键词: Transcription (linguistics)PhonotacticsMaximum likelihoodComputer scienceSpeech recognitionPhoneticsPronunciationSpeech processingDecoding methodsTraining set

摘要: Abstract This paper deals with a set of methods developed in order to derive multiple variants phonetic transcriptions for words, given sample utterances the words and an inventory context-dependent sub-word units. These use two phase process transcriptions. The first consists generating possible word using N-best decoding available that word. second selecting, word, ones describe best selection “best” is accomplished according different criteria. Frequency criterion chooses k most frequent all while Maximum Likelihood (ML) likely ones. With criteria same whatever is, each “describes” training A partition procedure, which determines “optimal” number then investigated. procedure assumes that, selected transcriptions, transcription must “describe” subset So, goal find “suitable” associate them pronunciations (utterances). Two iterative algorithms are evaluated, compromise between likelihood elements studied. Speaker-independent speech recognition experiments showed ML outperforms performance obtained former comparable reference Moreover, results on speaker-independent tasks improvement when partitioned transcriptions; i.e. represents only utterances.

参考文章(12)
Maxine Eskénazi, Mosur Ravishankar, Automatic generation of context-dependent pronunciations. conference of the international speech communication association. ,(1997)
Katarina Bartkova, Jean Monné, Denis Jouvet, On the modelization of allophones in an HMM based speech recognition system. conference of the international speech communication association. ,(1991)
Katarina Bartkova, A. Stouff, Denis Jouvet, Structure of allophonic models and reliable estimation of the contextual parameters. conference of the international speech communication association. ,(1994)
Houda Mokbel, Denis Jouvet, Automatic derivation of multiple variants of phonetic transcriptions from acoustic signals. conference of the international speech communication association. ,(1997)
Nick Cremelie, Jean-Pierre Martens, In Search of Pronunciation Rules Proceedings of the ESCA Tutorial and Workshop on Modeling Pronunciation Variations for Automatic Speech Recognition, Kerkrade - The Netherlands, mei. pp. 23- 27 ,(1998)
A. Asadi, R. Schwartz, J. Makhoul, Automatic modeling for adding new words to a large-vocabulary continuous speech recognition system international conference on acoustics, speech, and signal processing. pp. 305- 308 ,(1991) , 10.1109/ICASSP.1991.150337
F.K. Soong, E.-F. Huang, A tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition international conference on acoustics, speech, and signal processing. pp. 705- 708 ,(1991) , 10.1109/ICASSP.1991.150437
C. Sorin, D. Jouvet, C. Gagnoulet, D. Dubois, D. Sadek, M. Toularhoat, Operational and experimental French telecommunication services using CNET speech recognition and text-to-speech synthesis Speech Communication. ,vol. 17, pp. 273- 286 ,(1995) , 10.1016/0167-6393(95)00035-M
R. Haeb-Umbach, P. Beyerlein, E. Thelen, Automatic transcription of unknown words in a speech recognition system international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 840- 843 ,(1995) , 10.1109/ICASSP.1995.479825
L.R. Bahl, P.F. Brown, P.V. de Souza, R.L. Mercer, M.A. Picheny, A method for the construction of acoustic Markov models for words IEEE Transactions on Speech and Audio Processing. ,vol. 1, pp. 443- 452 ,(1993) , 10.1109/89.242490