Voice segmentation system based on energy estimation

作者: Raissa B. Rocha , Marcelo S. Alencar , Virginio V. Freire

DOI: 10.5281/ZENODO.44106

关键词: Acoustic modelSpeech synthesisVoice analysisVoice activity detectionEncoderSpeech analyticsHidden Markov modelSegmentationSpeech recognitionSpeech segmentationSpeech processingSpeech corpusComputer scienceLinear predictive coding

摘要: Voice segmentation is used in speech recognition and system synthesis, as well phonetic voice encoders. This paper describes an implicit system, which aims to estimate the boundaries between phonemes a locution. To find marks, proposed method initially locates reference borders silent periods phonemes, vice versa measuring energy short duration periods. The are found by means of encoding region delimited were detected. evaluate performance objective evaluation using 50 locutions was performed. detected 72.41% which, 77.6% with error less or equal 10 ms 22.4% 20 ms.

参考文章(7)
B. Sudhakar, R. Bens Raj, Automatic speech segmentation to improve speech synthesis performance international conference on circuits. pp. 835- 839 ,(2013) , 10.1109/ICCPCT.2013.6528953
Hamed Talea, Khashayar Yaghmaie, Automatic visual speech segmentation ieee international conference on communication software and networks. pp. 184- 188 ,(2011) , 10.1109/ICCSN.2011.6014877
Eren Akdemir, Tolga Ciloglu, Using visual information in automatic speech segmentation signal processing and communications applications conference. pp. 1- 4 ,(2008) , 10.1109/SIU.2008.4632641
S. Harish, P. Vijayalakshmi, T. Nagarajan, Significance of segmentation in phoneme based Tamil speech recognition system international conference on electronics computer technology. ,vol. 3, pp. 212- 215 ,(2011) , 10.1109/ICECTECH.2011.5941739
Cheng-Yuan Lin, Jyh-Shing Roger Jang, Automatic Phonetic Segmentation by Score Predictive Model for the Corpora of Mandarin Singing Voices IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 15, pp. 2151- 2159 ,(2007) , 10.1109/TASL.2007.902051
Kishore Prahallad, Alan W. Black, Segmentation of Monologues in Audio Books for Building Synthetic Voices IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 19, pp. 1444- 1449 ,(2011) , 10.1109/TASL.2010.2081980
Bartosz Ziolko, Suresh Manandhar, Mariusz Ziolko, Richard C. Wilson, Wavelet method of speech segmentation european signal processing conference. pp. 1- 5 ,(2006) , 10.5281/ZENODO.39637