作者: Dacheng Tao , Liang Ding , Di Wu , Yiren Chen
DOI:
关键词:
摘要: Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results its upstream ones. For example, Intent detection (ID), and slot filling (SF) require automatic speech recognition (ASR) to transform voice into text. In this case, perturbations, e.g. ASR errors, environmental noise careless user speaking, will propagate ID SF models, thus deteriorating performance. Therefore, well-performing models are expected be resistant some extent. However, existing trained clean data, which causes a \textit{gap between data training real-world inference.} To bridge gap, we propose method from perspective domain adaptation, by both high- low-quality samples embedding similar vector space. Meanwhile, design denoising generation model reduce impact samples. Experiments widely-used dataset, i.e. Snips, large scale in-house dataset (10 million examples) demonstrate that not only outperforms baseline (noisy) corpus but also enhances robustness, is, it produces high-quality under noisy environment. The source code released.