作者: Chen-Yu Chiang , Sin-Horng Chen , Hsiu-Min Yu , Yih-Ru Wang
DOI: 10.1121/1.3056559
关键词: Syllable 、 Speech recognition 、 Feature (machine learning) 、 Mandarin Chinese 、 Variation (linguistics) 、 Nonverbal communication 、 Prosody 、 Natural language 、 Juncture 、 Linguistics 、 Computer science 、 Discriminative model
摘要: An unsupervised joint prosody labeling and modeling method for Mandarin speech is proposed, a new scheme intended to construct statistical prosodic models label tags consistently speech. Two types of are determined by four designed illustrate the hierarchy prosody: break syllable juncture demarcate constituents state represent any domain’s pitch-level variation resulting from its upper-layered constituents’ influences. The performance proposed was evaluated using an unlabeled read-speech corpus articulated experienced female announcer. Experimental results showed that estimated parameters were able explore describe structures patterns prosody. Besides, certain corresponding relationships between indices labeled associated words found, manifested connections linguistic parameters, finding further verifying capability presented. Finally, quantitative comparison in human labelers indicated former more consistent discriminative than latter feature distributions, merit developed here on applications modeling.