Knowledge distillation for mixture of experts models in speech recognition

作者: Felipe Cruz Salinas , Kenichi Kumatani , Robert Gmyr , Linquan Liu , Yu Shi

DOI:

关键词:

摘要: The sparsely-gated mixture of experts (MoE) architecture can scale out large Transformer models to orders of magnitude which are not achievable by dense models with the current …

参考文章(0)