Probabilistic Integration of Joint Density Model and Speaker Model for Voice Conversion

作者: Nobuaki Minematsu , Daisuke Saito , Daisuke Saito , Shinji Watanabe , Atsushi Nakamura

DOI:

关键词:

摘要: This paper describes a novel approach to voice conversion using both joint density model and speaker model. In studies, approaches based on Gaussian Mixture Model (GMM) with probabilistic densities of vectors source target speakers are widely used estimate transformation. However, for sufficient quality, they require parallel corpus which contains plenty utterances the same linguistic content spoken by speakers. addition, GMM methods often suffer from over-training effects when amount training data is small. To compensate these problems, we propose integrate formulation. The proposed method trains few utterances, non-parallel target, independently. It eases burden speaker. Experiments demonstrate effectiveness method, especially Index Terms: conversion, model, unification

参考文章(13)
Aki Kunikoshi, Nobuaki Minematsu, Keikichi Hirose, Yu Qiao, Speech Generation from Hand Gestures Based on Space Mapping conference of the international speech communication association. pp. 308- 311 ,(2009)
Chung-Hsien Wu, Chung-Han Lee, Map-based adaptation for speech conversion using adaptation data selection and non-parallel training. conference of the international speech communication association. ,(2006)
Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano, Eigenvoice Conversion Based on Gaussian Mixture Model conference of the international speech communication association. ,(2006)
Li Deng, A. Acero, Li Jiang, J. Droppo, Xuedong Huang, High-performance robust speech recognition using stereo training data international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 301- 304 ,(2001) , 10.1109/ICASSP.2001.940827
Akira Kurematsu, Kazuya Takeda, Yoshinori Sagisaka, Shigeru Katagiri, Hisao Kuwabara, Kiyohiro Shikano, ATR Japanese speech database as a tool of speech recognition and synthesis Speech Communication. ,vol. 9, pp. 357- 363 ,(1990) , 10.1016/0167-6393(90)90011-W
Douglas A. Reynolds, Thomas F. Quatieri, Robert B. Dunn, Speaker Verification Using Adapted Gaussian Mixture Models Digital Signal Processing. ,vol. 10, pp. 19- 41 ,(2000) , 10.1006/DSPR.1999.0361
M. Abe, S. Nakamura, K. Shikano, H. Kuwabara, Voice conversion through vector quantization international conference on acoustics speech and signal processing. pp. 655- 658 ,(1988) , 10.1109/ICASSP.1988.196671
Tomoki Toda, Alan W. Black, Keiichi Tokuda, Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 15, pp. 2222- 2235 ,(2007) , 10.1109/TASL.2007.907344
A. Kain, M.W. Macon, Spectral voice conversion for text-to-speech synthesis international conference on acoustics speech and signal processing. ,vol. 1, pp. 285- 288 ,(1998) , 10.1109/ICASSP.1998.674423