Quantizing deep convolutional networks for efficient inference: A whitepaper

paper : https://arxiv.org/pdf/1806.08342.pdf

Overview

Overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations.

Quantizer Design

Uniform Affine Quantizer

storing weights and activations at 8-bits of precision
A naive implementation of convolution, by performing the addition of zero-point prior to the convolution, leads to a 2x to 4x reduction in the throughput due to wider (16/32-bit) operands

Uniform symmetric quantize

A simplified version of the affine quantizer is the symmetric quantizer, which restricts zero-point to 0

Stochastic quantizer : We do not consider stochastic quantization for inference