A Spanish multispeaker database of esophageal speech

作者: Luis Serrano García , Sneha Raman , Inma Hernáez Rioja , Eva Navas Cordón , Jon Sanchez

DOI: 10.1016/J.CSL.2020.101168

关键词: Digital signal processingEsophageal speechAlaryngeal speechSpeech TherapistDatabaseIntelligibility (communication)Parallel corporaSpeech processingComputer scienceLaryngectomee

摘要: Abstract A laryngectomee is a person whose larynx has been removed by surgery, usually due to laryngeal cancer. After most laryngectomees are able speak again, using techniques that learned with the help of speech therapist. This termed as alaryngeal speech, and esophageal (ES) one several production modes. considerable amount research dedicated study wide range aims such helping therapists evaluation diagnosis, improving its quality intelligibility digital signal processing techniques. We present you database Spanish ES voices, named AhoSLABI, which designed allow development new support technologies for this impairment. The primarily consists recordings 31 (27 males 4 females) pronouncing phonetically balanced sentences. Additionally, it includes parallel sentences 9 healthy speakers (6 3 facilitate tasks require small corpora, voice conversion or synthetic adaptation. Apart from sentences, sustained vowels set isolated words, can be valuable on analysis, diagnosis evaluation. paper describes main contents database, recording protocols procedure, well labeling process. acoustic characteristics speaking rate, durations recordings, phones silences, other compared those reduced voices. In addition, we describe an experiment improve performance ASR system speakers. resource will made available scientific community hope used life laryngectomees.

参考文章(59)
Mary H. Bellandese, Jay W. Lerman, Harvey R. Gilbert, An Acoustic Analysis of Excellent Female Esophageal, Tracheoesophageal, and Laryngeal Speakers Journal of Speech Language and Hearing Research. ,vol. 44, pp. 1315- 1320 ,(2001) , 10.1044/1092-4388(2001/102)
Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano, Statistical approach to enhancing esophageal speech based on Gaussian mixture models international conference on acoustics, speech, and signal processing. pp. 4250- 4253 ,(2010) , 10.1109/ICASSP.2010.5495676
Manwa L. Ng, Chui-Ling I. Kwok, Sau-Fong W. Chow, Speech performance of adult cantonese-speaking laryngectomees using different types of alaryngeal phonation Journal of Voice. ,vol. 11, pp. 338- 344 ,(1997) , 10.1016/S0892-1997(97)80013-6
F. Debruyne, P. Delaere, J. Wouters, P. Uwents, Acoustic analysis of tracheo-oesophageal versus oesophageal speech. Journal of Laryngology and Otology. ,vol. 108, pp. 325- 328 ,(1994) , 10.1017/S0022215100126660
Steve Young, Arantza del Pozo, Continuous tracheoesophageal speech repair european signal processing conference. pp. 1- 5 ,(2006) , 10.5281/ZENODO.39562
Ida K.-Y. Law, Estella P.-M. Ma, Edwin M.-L. Yiu, Speech intelligibility, acceptability, and communication-related quality of life in Chinese alaryngeal speakers. Archives of Otolaryngology-head & Neck Surgery. ,vol. 135, pp. 704- 711 ,(2009) , 10.1001/ARCHOTO.2009.71
Thomas Drugman, Myriam Rijckaert, Claire Janssens, Marc Remacle, Tracheoesophageal speech Computer Speech & Language. ,vol. 30, pp. 16- 31 ,(2015) , 10.1016/J.CSL.2014.07.003
Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano, An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 5136- 5139 ,(2011) , 10.1109/ICASSP.2011.5947513
B. M. R. Op de Coul, F. J. M. Hilgers, A. J. M. Balm, I. B. Tan, F. J. A. van den Hoogen, H. van Tinteren, A decade of postlaryngectomy vocal rehabilitation in 318 patients: a single Institution's experience with consistent application of provox indwelling voice prostheses. Archives of Otolaryngology-head & Neck Surgery. ,vol. 126, pp. 1320- 1328 ,(2000) , 10.1001/ARCHOTOL.126.11.1320