作者: Mari Ostendorf , Rebecca Bates , Izhak Shafran
DOI:
关键词:
摘要: This paper describes a formal model for incorporating prosody in the speech recognition process, both improving word directly and jointly recognizing words underlying structure. The includes possibility of using an intermediate symbolic representation as well direct conditioning on acoustic correlates. Alternatives feature extraction are described, together with implications statistical modeling. Examples spontaneous include clustering dynamic pronunciation