Voice conversion by mapping the speaker-specific features using pitch synchronous approach

作者: K. Sreenivasa Rao

DOI: 10.1016/J.CSL.2009.03.003

关键词:

摘要: The basic goal of the voice conversion system is to modify speaker-specific characteristics, keeping message and environmental information contained in speech signal intact. Speaker characteristics reflect at different levels, such as, shape glottal pulse (excitation source characteristics), vocal tract (vocal characteristics) long-term features (suprasegmental or prosodic characteristics). In this paper, we are proposing neural network models for developing mapping functions each level. used extracted using pitch synchronous analysis. Pitch analysis provides estimation accurate parameters, by analyzing independently period without influenced adjacent cycles. work, instants significant excitation as markers perform correspond closure (epochs) case voiced speech, some random excitations like onset burst nonvoiced speech. Instants computed from linear prediction (LP) residual signals property average group-delay minimum phase signals. line spectral frequencies (LSFs) representing its associated function. LP viewed source, samples around instant mapping. Prosodic parameters syllable phrase levels deriving Source level derived synchronously, incorporation target performed synchronously excitation. performance evaluated listening tests. accuracy (neural models) proposed further objective measures deviation (D"i), root mean square error (@m"R"M"S"E) correlation coefficient (@c"X","Y). approach (i.e., modification approach) shown be better compared earlier method (mapping block processing) author.

参考文章(37)
Vincent Barreaud, Laurent Blin, Olivier Boëffard, WEB-Based Listening Test System for Speech Synthesis and Speech Conversion Evaluation. language resources and evaluation. ,(2008)
A. Kain, M.W. Macon, Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 813- 816 ,(2001) , 10.1109/ICASSP.2001.941039
K. Sreenivasa Rao, R. H. Laskar, Shashidhar G. Koolagudi, Voice Transformation by Mapping the Features at Syllable Level Lecture Notes in Computer Science. pp. 479- 486 ,(2007) , 10.1007/978-3-540-77046-6_59
Levent M. Arslan, Oytun Turk, Donor selection for voice conversion european signal processing conference. pp. 1- 4 ,(2005)
Alan V. Oppenheim, Ronald W. Schafer, Discrete-Time Signal Processing ,(1989)
Levent M. Arslan, Oytun Türk, Subband based voice conversion. conference of the international speech communication association. ,(2002)
B. Yegnanarayana, K. Sharat Reddy, S.P. Kishore, Source and system features for speaker recognition using AANN models international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 409- 412 ,(2001) , 10.1109/ICASSP.2001.940854
B. Yegnanarayana, S.P. Kishore, AANN: an alternative to GMM for pattern recognition Neural Networks. ,vol. 15, pp. 459- 469 ,(2002) , 10.1016/S0893-6080(02)00019-9
Oytun Turk, Levent M. Arslan, Robust processing techniques for voice conversion Computer Speech & Language. ,vol. 20, pp. 441- 467 ,(2006) , 10.1016/J.CSL.2005.06.001
S. R. Mahadeva Prasanna, Jinu Mariam Zachariah, Detection of vowel onset point in speech IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 4, pp. 4159- ,(2002) , 10.1109/ICASSP.2002.5745575