Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

作者: Yu Tsao , Tai-Shih Chi , Syu-Siang Wang , Cheng Yu , Yun-Ju Chan

DOI:

关键词:

摘要: Multi-task learning (MTL) and attention mechanism have been proven to effectively extract robust acoustic features for various speech-related tasks in noisy environments. In this study, …

参考文章(41)
Philipos C. Loizou, Speech Enhancement: Theory and Practice ,(2007)
A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 749- 752 ,(2001) , 10.1109/ICASSP.2001.941023
Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, Björn Schuller, Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR international conference on latent variable analysis and signal separation. pp. 91- 99 ,(2015) , 10.1007/978-3-319-22482-4_11
J.S. Lim, A.V. Oppenheim, Enhancement and bandwidth compression of noisy speech Proceedings of the IEEE. ,vol. 67, pp. 1586- 1604 ,(1979) , 10.1109/PROC.1979.11540
Jinyu Li, Li Deng, Yifan Gong, Reinhold Haeb-Umbach, An overview of noise-robust automatic speech recognition IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 22, pp. 745- 777 ,(2014) , 10.1109/TASLP.2014.2304637
Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee, A regression approach to speech enhancement based on deep neural networks IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 23, pp. 7- 19 ,(2015) , 10.1109/TASLP.2014.2364452
Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, Javier Gonzalez-Dominguez, Deep neural networks for small footprint text-dependent speaker verification international conference on acoustics, speech, and signal processing. pp. 4052- 4056 ,(2014) , 10.1109/ICASSP.2014.6854363
P.C. Loizou, Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum IEEE Transactions on Speech and Audio Processing. ,vol. 13, pp. 857- 869 ,(2005) , 10.1109/TSA.2005.851929
Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 33, pp. 443- 445 ,(1984) , 10.1109/TASSP.1985.1164550
S. Boll, Suppression of acoustic noise in speech using spectral subtraction IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 27, pp. 113- 120 ,(1979) , 10.1109/TASSP.1979.1163209