作者: Françoise Beaufays , Ananda Theertha Suresh , Michael Riley , Cyril Allauzen , Mingqing Chen
DOI:
关键词:
摘要: We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used global for portable devices such as smart phones. especially relevant applications handling privacy-sensitive data, virtual keyboards, because training performed without the users' data ever leaving their devices. While principles of are fairly generic, its methodology assumes underlying neural networks. However, keyboards typically powered by latency reasons. We recurrent network model decentralized FederatedAveraging algorithm and approximate this server-side with an deployed fast inference. Our technical contributions include ways large vocabularies, correct capitalization errors in user efficient finite state transducer convert word word-piece vice versa. The trained compared n-grams traditional server-based A/B tests on tens millions users keyboard. Results presented two languages, American English Brazilian Portuguese. This work demonstrates high-quality directly client mobile sensitive