作者: Houda Mokbel , D. Jouvet
DOI: 10.1016/S0167-6393(99)00021-7
关键词: Transcription (linguistics) 、 Phonotactics 、 Maximum likelihood 、 Computer science 、 Speech recognition 、 Phonetics 、 Pronunciation 、 Speech processing 、 Decoding methods 、 Training set
摘要: Abstract This paper deals with a set of methods developed in order to derive multiple variants phonetic transcriptions for words, given sample utterances the words and an inventory context-dependent sub-word units. These use two phase process transcriptions. The first consists generating possible word using N-best decoding available that word. second selecting, word, ones describe best selection “best” is accomplished according different criteria. Frequency criterion chooses k most frequent all while Maximum Likelihood (ML) likely ones. With criteria same whatever is, each “describes” training A partition procedure, which determines “optimal” number then investigated. procedure assumes that, selected transcriptions, transcription must “describe” subset So, goal find “suitable” associate them pronunciations (utterances). Two iterative algorithms are evaluated, compromise between likelihood elements studied. Speaker-independent speech recognition experiments showed ML outperforms performance obtained former comparable reference Moreover, results on speaker-independent tasks improvement when partitioned transcriptions; i.e. represents only utterances.