Methods for generating pitch and duration contours in a text to speech system

作者: Ellen M. Eide , Robert E. Donovan

DOI:

关键词: Constant (mathematics)Speech recognitionSpeech synthesisDuration (music)ElectroglottographStress (linguistics)Computer scienceBlock (data storage)SignalNatural (music)

摘要: A method for automatically generating pitch contours in a text to speech (TtS) system, the system converting input into an output acoustic signal simulating natural speech, comprising steps of: storing plurality of associated stress and level pairs, each pairs including lexical level; calculating levels text; comparing stored find closest copying with generate text. Features illustrative various modes invention include that correspond end vowels, use phonetic dictionary expand words phonemes concatenate levels, blocking sentences constant or variable lengths by segmenting from ends toward beginnings, averaging at block boundary. The may distinguish among declarations, questions, exclamations. Training be collected more than one speaker scaled; speaker(s) wear laryngograph provide vocal cord activity.

参考文章(10)
Noriko Umeda, Cecil Howard Coker, Conversion of printed text into synthetic speech ,(1971)
Xuedong Huang, A. Acero, J. Adcock, Hsiao-Wuen Hon, J. Goldsmith, Jingsong Liu, M. Plumpe, Whistler: a trainable text-to-speech system international conference on spoken language processing. ,vol. 4, pp. 2387- 2390 ,(1996) , 10.1109/ICSLP.1996.607289
Lyubomir Y. Antonov, Method of and device for synthesis of speech from printed text The Journal of the Acoustical Society of America. ,vol. 78, pp. 1930- 1930 ,(1985) , 10.1121/1.392678
Sandra E. Hutchins, Method and apparatus for speech synthesis based on prosodic analysis Journal of the Acoustical Society of America. ,vol. 98, pp. 688- 688 ,(1992) , 10.1121/1.413554
Hector R. Javkin, Synthesis-based speech training system and method Journal of the Acoustical Society of America. ,vol. 101, pp. 2426- ,(1994) , 10.1121/1.418512
X. Huang, A. Acero, H. Hon, Y. Ju, J. Liu, S. Meredith, M. Plumpe, Recent improvements on Microsoft's trainable text-to-speech system-Whistler 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing. ,vol. 2, pp. 959- 962 ,(1997) , 10.1109/ICASSP.1997.596097
Alejandro Acero, Xuedong D. Huang, Michael D. Plumpe, James L. Adcock, Method and system of runtime acoustic unit selection for speech synthesis ,(1997)
G.D. Forney, The viterbi algorithm Proceedings of the IEEE. ,vol. 61, pp. 268- 278 ,(1973) , 10.1109/PROC.1973.9030
John Nicholas Holmes, Speech synthesis ,(1972)