作者: Kenichi Kumatani , Robert Gmyr , Felipe Cruz Salinas , Linquan Liu , Wei Zuo
DOI:
关键词:
摘要: … The sparsely-gated Mixture of Experts (MoE) can magnify a … More specifically, we apply the sparsely-gated MoE technique to two … , End-to-end model, Mixture of experts, Transformers …