作者: Mark Hasegawa-Johnson , Ken Chen , Aaron Cohen
DOI:
关键词:
摘要: Automatic prosody recognition (APR) is of fundamental importance for automatic speech understanding. In this paper, we propose a maximum likelihood recognizer consisting GMM-based acoustic model that models the distribution phone-level acoustic-prosodic observations (pitch, duration and energy) an ANN-based language word-level stochastic dependence between syntax. Our experiments on Radio News Corpus show our able to achieve 84% pitch accent accuracy 93% intonational phrase boundary (IPB) in leave-one-speaker-out task which has exceeded previous reported results same corpus. The tested subset Switchboard accuracies are degraded but still significantly better than chance levels.