作者: Sankaranarayanan Ananthakrishnan , Shrikanth Narayanan
DOI: 10.1109/ICASSP.2007.367209
关键词:
摘要: Most statistical speech recognition systems make use of segment-level features, derived mainly from spectral envelope characteristics the signal, but ignore supra-segmental cues that carry additional information likely to be useful for recognition. These cues, which constitute prosody utterance and occur at syllable, word level, are closely related lexical syntactic organization utterance. In this paper, we explore acoustic correlates a subset these in order improve performance on read-speech corpus, using error rate (WER) as metric. Using features methods described were able obtain relative WER improvement 1.3% over baseline ASR system Boston University Radio News Corpus.