Alternating Multi-bit Quantization for Recurrent Neural Networks.

作者: Hongbin Zha , Zhouchen Lin , Wenwu Ou , Chen Xu , Zhirong Wang

DOI:

关键词: Quantization (signal processing)Optimization problemComputer scienceBinary codeAlgorithmRecurrent neural networkFeedforward neural networkContextual image classificationInference

摘要: Recurrent neural networks have achieved excellent performance in many applications. However, on portable devices with limited resources, the models are often too large to deploy. For applications on the server with large scale concurrent requests, the latency during inference can also be very critical for costly computing resources. In this work, we address these problems by quantizing the network, both weights and activations, into multiple binary codes {-1,+ 1}. We formulate the quantization as an optimization problem …

参考文章(26)
Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556
Yunchao Gong, Lubomir D. Bourdev, Liu Liu, Ming Yang, Compressing Deep Convolutional Networks using Vector Quantization arXiv: Computer Vision and Pattern Recognition. ,(2014)
Yoshua Bengio, Matthieu Courbariaux, Jean-Pierre David, BinaryConnect: Training Deep Neural Networks with binary weights during propagations arXiv: Learning. ,(2015)
Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, Marianna Penksy, Sparse Convolutional Neural Networks computer vision and pattern recognition. pp. 806- 814 ,(2015) , 10.1109/CVPR.2015.7298681
Tomas Mikolov, Marc ' Aurelio Ranzato, Armand Joulin, Michael Mathieu, Sumit Chopra, LEARNING LONGER MEMORY IN RECURRENT NEURAL NETWORKS arXiv: Neural and Evolutionary Computing. ,(2014)
Benoît Colson, Patrice Marcotte, Gilles Savard, An overview of bilevel optimization Annals of Operations Research. ,vol. 153, pp. 235- 256 ,(2007) , 10.1007/S10479-007-0176-2
Ivan Oseledets, Victor Lempitsky, Yaroslav Ganin, Vadim Lebedev, Vadim Lebedev, Maksim Rakhuba, Maksim Rakhuba, Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition arXiv: Computer Vision and Pattern Recognition. ,(2014)
Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton, Speech recognition with deep recurrent neural networks international conference on acoustics, speech, and signal processing. pp. 6645- 6649 ,(2013) , 10.1109/ICASSP.2013.6638947
Bin Liu, Fengfu Li, Xiaoxing Wang, Bo Zhang, Junchi Yan, Ternary Weight Networks arXiv: Computer Vision and Pattern Recognition. ,(2016)
Shuchang Zhou, He Wen, Yuxin Wu, Yuheng Zou, Zekun Ni, Xinyu Zhou, DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients arXiv: Neural and Evolutionary Computing. ,(2016)