Federated Learning of N-gram Language Models

作者: Françoise Beaufays , Ananda Theertha Suresh , Michael Riley , Cyril Allauzen , Mingqing Chen

DOI:

关键词:

摘要: We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used global for portable devices such as smart phones. especially relevant applications handling privacy-sensitive data, virtual keyboards, because training performed without the users' data ever leaving their devices. While principles of are fairly generic, its methodology assumes underlying neural networks. However, keyboards typically powered by latency reasons. We recurrent network model decentralized FederatedAveraging algorithm and approximate this server-side with an deployed fast inference. Our technical contributions include ways large vocabularies, correct capitalization errors in user efficient finite state transducer convert word word-piece vice versa. The trained compared n-grams traditional server-based A/B tests on tens millions users keyboard. Results presented two languages, American English Brazilian Portuguese. This work demonstrates high-quality directly client mobile sensitive

参考文章(29)
Ouais Alsharif, Tom Ouyang, Francoise Beaufays, Shumin Zhai, Thomas Breuel, Johan Schalkwyk, Long short term memory neural network for keyboard gesture decoding 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2076- 2080 ,(2015) , 10.1109/ICASSP.2015.7178336
Klaus Greff, Rupesh K. Srivastava, Jan Koutnik, Bas R. Steunebrink, Jurgen Schmidhuber, LSTM: A Search Space Odyssey IEEE Transactions on Neural Networks. ,vol. 28, pp. 2222- 2232 ,(2017) , 10.1109/TNNLS.2016.2582924
Çaglar Gülçehre, Yoshua Bengio, Yoshua Bengio, Yoshua Bengio, KyungHyun Cho, Junyoung Chung, Empirical evaluation of gated recurrent neural networks on sequence modeling arXiv: Neural and Evolutionary Computing. ,(2014)
Bhaswar B. Bhattacharya, Gregory Valiant, Testing Closeness With Unequal Sized Samples arXiv: Learning. ,(2015)
Cynthia Dwork, Aaron Roth, The Algorithmic Foundations of Differential Privacy ,(2014)
Francoise Beaufays, Brian Strope, Language model capitalization international conference on acoustics, speech, and signal processing. pp. 6749- 6752 ,(2013) , 10.1109/ICASSP.2013.6638968
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
William Lewis, Robert C. Moore, Intelligent Selection of Language Model Training Data meeting of the association for computational linguistics. pp. 220- 224 ,(2010)
Mike Schuster, Kaisuke Nakajima, Japanese and Korean voice search 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 5149- 5152 ,(2012) , 10.1109/ICASSP.2012.6289079
Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Sequence to Sequence Learning with Neural Networks neural information processing systems. ,vol. 27, pp. 3104- 3112 ,(2014)