Statistical approach to enhancing esophageal speech based on Gaussian mixture models

作者： Hironori Doi , Keigo Nakamura , Tomoki Toda , Hiroshi Saruwatari , Kiyohiro Shikano

DOI: 10.1109/ICASSP.2010.5495676

关键词:

摘要: This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal is one the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices sound unnatural. To improve intelligibility and naturalness speech, we propose conversion from into normal speech. A spectral parameter excitation parameters target are separately estimated based on Gaussian mixture models. The experimental results demonstrate that proposed yields significant improvements in naturalness. We also apply one-to-many eigenvoice to enhancement flexibly controlling enhanced quality.

参考文章(12)

K. Matsui, N. Hara, Enhancement of esophageal speech using formant synthesis international conference on acoustics speech and signal processing. ,vol. 1, pp. 81- 84 ,(1999) , 10.1109/ICASSP.1999.758067

Alain de Cheveigné, Haruhiro Katayose, Hideki Kawahara, Roy D. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity conference of the international speech communication association. ,vol. 6, pp. 2781- 2784 ,(1999)

K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1315- 1318 ,(2000) , 10.1109/ICASSP.2000.861820

Y. Stylianou, O. Cappe, E. Moulines, Continuous probabilistic transform for voice conversion IEEE Transactions on Speech and Audio Processing. ,vol. 6, pp. 131- 142 ,(1998) , 10.1109/89.661472

Hideki Kawahara, Osamu Fujimura, Jo Estill, Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT. MAVEBA. pp. 59- 64 ,(2001)

A Hisada, H Sawada, Real-time clarification of esophageal speech using a comb filter ,(2002)

Hiroshi Saruwatari, Tomoki Toda, Hironori Doi, Keigo Nakamura, Kiyohiro Shikano, Enhancement of Esophageal Speech Using Statistical Voice Conversion asia pacific signal and information processing association annual summit and conference. pp. 805- 808 ,(2009)

Kenji Matsui, Noriyo Hara, Noriko Kobayashi, Hajime Hirose, Enhancement of esophageal speech using formant synthesis Acoustical Science and Technology. ,vol. 23, pp. 69- 76 ,(2002) , 10.1250/AST.23.69

Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano, One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 1249- 1252 ,(2007) , 10.1109/ICASSP.2007.367303

10.

Alain de Cheveigné, Hideki Kawahara, Ikuyo Masuda-Katsuse, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds Speech Communication. ,vol. 27, pp. 187- 207 ,(1999) , 10.1016/S0167-6393(98)00085-5

Statistical approach to enhancing esophageal speech based on Gaussian mixture models

来源期刊

我的账户

Statistical approach to enhancing esophageal speech based on Gaussian mixture models

来源期刊

相似文章 10

我的账户