作者:
关键词:
摘要: The quality of unit selection based concatenative speech synthesis mainly depends on how well two successive units can be joined together to minimise the audible discontinuities. objective measure discontinuity used when selecting is known as join cost. ideal cost will perceived discontinuity, easily measurable spectral properties being joined, in order ensure smooth and natural-sounding synthetic speech. In this paper we describe a perceptual experiment conducted correlation between subjective human perception various spectrally-based measures proposed literature. Also report new distance derived from metrics these features, which have good with concatenation Our experiments state-of-the art unit-selection text-to-speech system: rVoice Rhetorical Systems Ltd.