Towards High-Quality Next-Generation Text-to-Speech Synthesis: A Multidomain Approach by Automatic Domain Classification

作者: F. Alias , X. Sevillano , J.C. Socoro , X. Gonzalvo

DOI: 10.1109/TASL.2008.925145

关键词: Speech processingNatural language processingWord processingNatural languageSpeech recognitionSpeech synthesisDomain (software engineering)Artificial intelligenceContext (language use)Computer scienceText processingField (computer science)

摘要: This paper is a contribution to the recent advancements in development of high-quality next generation text-to-speech (TTS) synthesis systems. Two hottest research topics this area are oriented towards improvement speech expressiveness and flexibility synthesis. In context, presents new TTS strategy called multidomain (MD-TTS) for synthesizing among different domains. Although philosophy has been widely applied spoken language systems, few efforts have conducted extend it field. To do so, several proposals described paper. First, text classifier (TC) included classic architecture order automatically conduct selection most appropriate domain input text. contrast topic classification tasks, MD-TTS TC should not only consider contents but also its structure. end, introduces modeling scheme based on an associative relational network, which represents texts as directional weighted word-based graph. The experiments validate proposal terms both objective (TC efficiency) subjective (perceived synthetic quality) evaluation criteria.

参考文章(56)
Alan W. Black, Unit selection and emotional speech. conference of the international speech communication association. ,(2003)
John F. Pitrelli, Raimo Bakis, Michael Picheny, Ellen Eide, Wael Hamza, The IBM expressive speech synthesis system. conference of the international speech communication association. ,(2004)
Shrikanth S. Narayanan, Shiva Sundaram, An empirical text transformation method for spontaneous speech synthesizers. conference of the international speech communication association. ,(2003)
Carmen García-Mateo, David Pérez-Piñar López, Application of Confidence Measures for Dialogue Systems through the Use of Parallel Speech Recognizers conference of the international speech communication association. pp. 2785- 2788 ,(2005)
James Allan, Perspectives on Information Retrieval and Speech international acm sigir conference on research and development in information retrieval. pp. 1- 10 ,(2001) , 10.1007/3-540-45637-6_1
Nello Cristianini, John Shawe-Taylor, Kernel Methods for Pattern Analysis ,(2004)
Bernd Möbius, Gérard Bailly, Nick Campbell, ISCA special session: hot topics in speech synthesis. conference of the international speech communication association. ,(2003)
J.M. Trenkle, W.B. Cavnar, N-gram-based text categorization ,(1994)
Gregor O Hofer, Robert A J Clark, Korin Richmond, Informed Blending of Databases for Emotional Speech Synthesis conference of the international speech communication association. pp. 501- 504 ,(2005)
Tomoki Toda, Heiga Zen, An Overview of Nitech HMM-based Speech Synthesis System for Blizzard Challenge 2005 conference of the international speech communication association. pp. 93- 96 ,(2005)