Nonlinear emotional prosody generation and annotation

作者: Jianhua Tao , Jian Yu , Yongguo Kang

DOI: 10.1007/11939993_23

关键词:

摘要: Emotion is an important element in expressive speech synthesis. The paper makes the brief analysis on prosody parameters, stresses, rhythms and paralinguistic information different emotional speech, labels with rich annotation multi-layers. Then, a CART model used to do generation. Unlike traditional linear modification method, which direct of F0 contours syllabic durations from acoustic distributions such as, topline, baseline, intensities, models try map subtle between neutral within various context information. Experiments show that, model, able generate good outputs, however results could be improved if more information, as breaks jitter are integrated into

参考文章(16)
Stefan Breuer, Marc Schröder, XML representation languages as a way of interconnecting TTS modules. conference of the international speech communication association. ,(2004)
Chung-Hsien Wu, Ze Jing Chuang, Emotion recognition from textual input using an emotional semantic network conference of the international speech communication association. ,(2002)
Aijun Li, Haibo Wang, Friendly speech analysis and perception in standard Chinese. conference of the international speech communication association. ,(2004)
Dik J. Hermes, Sylvie J. L. Mozziconacci, Expression of emotion and attitude through temporal speech variations conference of the international speech communication association. pp. 373- 378 ,(2000)
Jianhua Tao, Emotion Control of Chinese Speech Synthesis in Natural Environment conference of the international speech communication association. ,(2003)
Chilin Shih, Greg P. Kochanski, Stem-ML: language-independent prosody description. conference of the international speech communication association. pp. 239- 242 ,(2000)
Noam Amir, Dimitry Karlinski, Ori Kerret, CLASSIFYING EMOTIONS IN SPEECH: A COMPARISON OF METHODS conference of the international speech communication association. pp. 127- 130 ,(2001)
Ellen Douglas-Cowie, Sybert Stroeve, Stan Gielen, Roddy Cowie, Sinéad McGilloway, Machiel Westerdijk, APPROACHING AUTOMATIC RECOGNITION OF EMOTION FROM VOICE: A ROUGH BENCHMARK ,(2000)
Iain R. Murray, John L. Arnott, Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion Journal of the Acoustical Society of America. ,vol. 93, pp. 1097- 1108 ,(1993) , 10.1121/1.405558
Hiroya Fujisaki, Keikichi Hirose, Analysis of voice fundamental frequency contours for declarative sentences of Japanese The Journal of The Acoustical Society of Japan (e). ,vol. 5, pp. 233- 242 ,(1984) , 10.1250/AST.5.233