CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement

作者： Gang Liu , Ke Gong , Xiaodan Liang , Zhiguang Chen

DOI: 10.1109/ICASSP40776.2020.9054060

关键词: Speech recognition 、 Context (language use) 、 Noise (video) 、 Speaker recognition 、 Computer science 、 Feature (machine learning) 、 Speech enhancement 、 Pyramid (image processing) 、 Discriminator

摘要: … recognition and speaker recognition based on the enhanced audios, which also proves that our CP-GAN for speech enhancement is capable of boosting the high-level speech tasks. …

uni-trier.de PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(21)

Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur, Librispeech: An ASR corpus based on public domain audio books international conference on acoustics, speech, and signal processing. pp. 5206- 5210 ,(2015) , 10.1109/ICASSP.2015.7178964

Joachim Thiemann, Nobutaka Ito, Emmanuel Vincent, The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings Journal of the Acoustical Society of America. ,vol. 133, pp. 3591- 3591 ,(2013) , 10.1121/1.4806631

Christophe Veaux, Junichi Yamagishi, Simon King, The voice bank corpus: Design, collection and data analysis of a large regional accent speech database international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation. pp. 1- 4 ,(2013) , 10.1109/ICSDA.2013.6709856

, Generative Adversarial Nets neural information processing systems. ,vol. 27, pp. 2672- 2680 ,(2014) , 10.3156/JSOFT.29.5_177_2

Cees H. Taal, Richard C. Hendriks, Richard Heusdens, Jesper Jensen, An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 19, pp. 2125- 2136 ,(2011) , 10.1109/TASL.2011.2114881

Yi Hu, Philipos C. Loizou, Evaluation of Objective Quality Measures for Speech Enhancement IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 16, pp. 229- 238 ,(2008) , 10.1109/TASL.2007.911054

Donald S. Williamson, Yuxuan Wang, DeLiang Wang, Complex ratio masking for monaural speech separation IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 24, pp. 483- 492 ,(2016) , 10.1109/TASLP.2015.2512042

Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, Feature Pyramid Networks for Object Detection computer vision and pattern recognition. pp. 936- 944 ,(2017) , 10.1109/CVPR.2017.106

Takaaki Hori, John R. Hershey, Shinji Watanabe, Tsubasa Ochiai, Multichannel end-to-end speech recognition international conference on machine learning. pp. 2632- 2641 ,(2017)

10.

Santiago Pascual, Antonio Bonafonte, Multi-output RNN-LSTM for multiple speaker speech synthesis with α-interpolation model 9th ISCA Speech Synthesis Workshop. pp. 112- 117 ,(2016) , 10.21437/SSW.2016-19

CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement

来源期刊

我的账户

CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement

来源期刊

相似文章 3

Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction

Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network for End-to-end Speech Enhancement

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement.

我的账户