A. Quantization and Binarization

reducing the number of bits : kkmeans scalar quantization
significant speed-up with minimal loss of accuracy : 8-bit quantization
reduced memory usage and float point operations with little loss in classification accuracy : 16-bit fixed-point representation in stochastic rounding based CNN training
weight sharing and then applied Huffman coding to the quantized weights as well as the codebook to further reduce the rate : the network was retrained to learn the final weights for the remaining sparse connections
minimize Hessian- weighted quantization errors in average to cluster parameters
the 1-bit representation of each weight, that is binary weight neural networks

kkmeans scalar quantization

https://arxiv.org/abs/1412.6115

[6] Y. Gong, L. Liu, M. Yang, and L. D. Bourdev, “Compressing deep convolutional networks using vector quantization,” CoRR, vol. abs/1412.6115, 2014.(FAIR)

https://arxiv.org/abs/1512.06473

[7] Y. W. Q. H. Jiaxiang Wu, Cong Leng and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

8-bit quantization

https://pub-tools-public-publication-data.storage.googleapis.com/pdf/37631.pdf

[8] V. Vanhoucke, A. Senior, and M. Z. Mao, “Improving the speed of neural networks on cpus,” in Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011, 2011.

A. Quantization and Binarization

kkmeans scalar quantization

8-bit quantization

16-bit fixed-point representation