CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement

作者: Gang Liu , Ke Gong , Xiaodan Liang , Zhiguang Chen

DOI: 10.1109/ICASSP40776.2020.9054060

关键词: Speech recognitionContext (language use)Noise (video)Speaker recognitionComputer scienceFeature (machine learning)Speech enhancementPyramid (image processing)Discriminator

摘要: … recognition and speaker recognition based on the enhanced audios, which also proves that our CP-GAN for speech enhancement is capable of boosting the high-level speech tasks. …

参考文章(21)
Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur, Librispeech: An ASR corpus based on public domain audio books international conference on acoustics, speech, and signal processing. pp. 5206- 5210 ,(2015) , 10.1109/ICASSP.2015.7178964
Joachim Thiemann, Nobutaka Ito, Emmanuel Vincent, The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings Journal of the Acoustical Society of America. ,vol. 133, pp. 3591- 3591 ,(2013) , 10.1121/1.4806631
Christophe Veaux, Junichi Yamagishi, Simon King, The voice bank corpus: Design, collection and data analysis of a large regional accent speech database international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation. pp. 1- 4 ,(2013) , 10.1109/ICSDA.2013.6709856
, Generative Adversarial Nets neural information processing systems. ,vol. 27, pp. 2672- 2680 ,(2014) , 10.3156/JSOFT.29.5_177_2
Cees H. Taal, Richard C. Hendriks, Richard Heusdens, Jesper Jensen, An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 19, pp. 2125- 2136 ,(2011) , 10.1109/TASL.2011.2114881
Yi Hu, Philipos C. Loizou, Evaluation of Objective Quality Measures for Speech Enhancement IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 16, pp. 229- 238 ,(2008) , 10.1109/TASL.2007.911054
Donald S. Williamson, Yuxuan Wang, DeLiang Wang, Complex ratio masking for monaural speech separation IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 24, pp. 483- 492 ,(2016) , 10.1109/TASLP.2015.2512042
Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, Feature Pyramid Networks for Object Detection computer vision and pattern recognition. pp. 936- 944 ,(2017) , 10.1109/CVPR.2017.106
Takaaki Hori, John R. Hershey, Shinji Watanabe, Tsubasa Ochiai, Multichannel end-to-end speech recognition international conference on machine learning. pp. 2632- 2641 ,(2017)
Santiago Pascual, Antonio Bonafonte, Multi-output RNN-LSTM for multiple speaker speech synthesis with α-interpolation model 9th ISCA Speech Synthesis Workshop. pp. 112- 117 ,(2016) , 10.21437/SSW.2016-19