Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

作者： Chenggang Mi , Lei Xie , Yanning Zhang

DOI:

关键词:

摘要: High quality end-to-end speech translation model relies on a large scale of speech-to-text training data, which is usually scarce or even unavailable for some low-resource language pairs. To overcome this, we propose a target-side data augmentation method for low-resource language speech translation. In particular, we first generate large-scale target-side paraphrases based on a paraphrase generation model which incorporates several statistical machine translation (SMT) features and the commonly used recurrent neural network (RNN) feature. Then, a filtering model which consists of semantic similarity and speech–word pair co-occurrence was proposed to select the highest scoring source speech–target paraphrase pairs from candidates. Experimental results on English, Arabic, German, Latvian, Estonian, Slovenian and Swedish paraphrase generation show that the proposed method achieves significant …

sciencedirect.com 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(0)

Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

来源期刊

我的账户

Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

来源期刊

相似文章 0

我的账户