Knowledge distillation for mixture of experts models in speech recognition

作者： Felipe Cruz Salinas , Kenichi Kumatani , Robert Gmyr , Linquan Liu , Yu Shi

DOI:

关键词:

摘要: The sparsely-gated mixture of experts (MoE) architecture can scale out large Transformer models to orders of magnitude which are not achievable by dense models with the current …

microsoft.com 本地加速

microsoft.com PDF 下载加速

参考文章(0)

Knowledge distillation for mixture of experts models in speech recognition

来源期刊

我的账户

Knowledge distillation for mixture of experts models in speech recognition

来源期刊

相似文章 0

我的账户