Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference.

作者: Yaman Umuroglu , Nhan Tran , Nicholas J. Fraser , Javier M. Duarte , Benjamin Hawks

DOI:

关键词:

摘要: Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits depending on the application from lower latencies to higher data throughputs more efficient energy consumption. Two popular techniques reducing computation neural networks are pruning, removing insignificant synapses, and quantization, precision of calculations. In this work, we explore interplay between pruning quantization during training ultra low latency applications targeting high physics use cases. However, developed study potential across many other domains. We various configurations quantization-aware training, which term \emph{quantization-aware pruning} effect like regularization, batch normalization, different schemes multiple computational or efficiency metrics. find that yields computationally models than either alone our task. Further, typically performs similar better terms compared standard architecture optimization techniques. While accuracy benchmark may be similar, information content network can vary significantly based configuration.

参考文章(59)
Vincent Vanhoucke, Andrew Senior, Mark Z. Mao, Improving the speed of neural networks on CPUs hgpu.org. ,(2011)
Donald R. Jones, Matthias Schonlau, William J. Welch, Efficient Global Optimization of Expensive Black-Box Functions Journal of Global Optimization. ,vol. 13, pp. 455- 492 ,(1998) , 10.1023/A:1008306431147
Geoffrey E. Hinton, Vinod Nair, Rectified Linear Units Improve Restricted Boltzmann Machines international conference on machine learning. pp. 807- 814 ,(2010)
Yunchao Gong, Lubomir D. Bourdev, Liu Liu, Ming Yang, Compressing Deep Convolutional Networks using Vector Quantization arXiv: Computer Vision and Pattern Recognition. ,(2014)
CE Shennon, Warren Weaver, A mathematical theory of communication Bell System Technical Journal. ,vol. 27, pp. 379- 423 ,(1948) , 10.1002/J.1538-7305.1948.TB01338.X
Andrew Y. Ng, Feature selection, L1 vs. L2 regularization, and rotational invariance Twenty-first international conference on Machine learning - ICML '04. pp. 78- ,(2004) , 10.1145/1015330.1015435
Yann LeCun, John Denker, Sara Solla, None, Optimal Brain Damage neural information processing systems. ,vol. 2, pp. 598- 605 ,(1989)
Yoshua Bengio, Xavier Glorot, Antoine Bordes, Deep Sparse Rectifier Neural Networks international conference on artificial intelligence and statistics. ,vol. 15, pp. 315- 323 ,(2011)
Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng, Quantized Convolutional Neural Networks for Mobile Devices computer vision and pattern recognition. pp. 4820- 4828 ,(2016) , 10.1109/CVPR.2016.521