A Maximum Likelihood Prosody Recognizer

作者: Mark Hasegawa-Johnson , Ken Chen , Aaron Cohen

DOI:

关键词:

摘要: Automatic prosody recognition (APR) is of fundamental importance for automatic speech understanding. In this paper, we propose a maximum likelihood recognizer consisting GMM-based acoustic model that models the distribution phone-level acoustic-prosodic observations (pitch, duration and energy) an ANN-based language word-level stochastic dependence between syntax. Our experiments on Radio News Corpus show our able to achieve 84% pitch accent accuracy 93% intonational phrase boundary (IPB) in leave-one-speaker-out task which has exceeded previous reported results same corpus. The tested subset Switchboard accuracies are degraded but still significantly better than chance levels.

参考文章(2)
Eugene Charniak, A maximum-entropy-inspired parser north american chapter of the association for computational linguistics. pp. 132- 139 ,(2000)
J. G. Carbonell, Ralf Kompe, J. Siekmann, Prosody in Speech Understanding Systems ,(1997)