Statistical approach to enhancing esophageal speech based on Gaussian mixture models

作者: Hironori Doi , Keigo Nakamura , Tomoki Toda , Hiroshi Saruwatari , Kiyohiro Shikano

DOI: 10.1109/ICASSP.2010.5495676

关键词:

摘要: This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal is one the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices sound unnatural. To improve intelligibility and naturalness speech, we propose conversion from into normal speech. A spectral parameter excitation parameters target are separately estimated based on Gaussian mixture models. The experimental results demonstrate that proposed yields significant improvements in naturalness. We also apply one-to-many eigenvoice to enhancement flexibly controlling enhanced quality.

参考文章(12)
K. Matsui, N. Hara, Enhancement of esophageal speech using formant synthesis international conference on acoustics speech and signal processing. ,vol. 1, pp. 81- 84 ,(1999) , 10.1109/ICASSP.1999.758067
Alain de Cheveigné, Haruhiro Katayose, Hideki Kawahara, Roy D. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity conference of the international speech communication association. ,vol. 6, pp. 2781- 2784 ,(1999)
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1315- 1318 ,(2000) , 10.1109/ICASSP.2000.861820
Y. Stylianou, O. Cappe, E. Moulines, Continuous probabilistic transform for voice conversion IEEE Transactions on Speech and Audio Processing. ,vol. 6, pp. 131- 142 ,(1998) , 10.1109/89.661472
Hiroshi Saruwatari, Tomoki Toda, Hironori Doi, Keigo Nakamura, Kiyohiro Shikano, Enhancement of Esophageal Speech Using Statistical Voice Conversion asia pacific signal and information processing association annual summit and conference. pp. 805- 808 ,(2009)
Kenji Matsui, Noriyo Hara, Noriko Kobayashi, Hajime Hirose, Enhancement of esophageal speech using formant synthesis Acoustical Science and Technology. ,vol. 23, pp. 69- 76 ,(2002) , 10.1250/AST.23.69
Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano, One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 1249- 1252 ,(2007) , 10.1109/ICASSP.2007.367303