Model Quantization

I. Why Quantization

Quantization converts the main operators (Convolution, Pooling, Binary, etc.) in the network from the original floating-point precision to the int8 precision, reducing the model size and improving performance. PS:

For the KL quantization method, you can refer to: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

II. Compile

1. Build

cd <path_to_tnn>/platforms/linux/
./build_quanttool.sh -c

2. Output

Binary of the quantization tool: <path_to_tnn>/platforms/linux/build/quantization_cmd

III. Usage

1. Command

./quantization_cmd [-h] [-p] <proto file> [-m] <model file> [-i] <input folder> [-b] <val> [-w] <val> [-n] <val> [-s] <val> [-t] <val> [-o] <output_name>

2. Parameter Description

option	mandatory	with value	description
-h, --help			Output command prompt.
-p, --proto	√	√	Specify tnnproto model description file.
-m, --model	√	√	Specify the tnnmodel model parameter file.
-i, --input_path	√	√	Specify the path of the quantitative input folder. The currently supported formats are: • Text file (the file suffix is .txt) • Common picture format files (file suffix is .jpg .jpeg .png .bmp) All files under this directory will be used as input.
-b, --blob_method		√	Specify the feature map quantization method： • 0 Min-Max method (default) • 2 KL method
-w, --weight_method		√	Specify the quantification method of weights: • 0 Min-Max method (default) • 1 ADMM method
-n, --mean		√	Pre-processing, mean operation on each channel of input data, parameter format: 0.0, 0.0, 0.0
-s, --scale		√	Pre-processing, scale the input data channels, the parameter format is: 1.0, 1.0, 1.0
-r, --reverse_channel		√	Pre-processing, valid for picture format files: • 0 use RGB order (default) • 1 use BGR order
-t, --merge_type		√	Whether use per-tensor or per-channel method when quantifying: • 0 per-channel method (default) • 1 mix method, weights: per-channel, blob: per-tensor. • 2 per-tensor method
-o, --output		√	Specify the output name

3. Quantization Input

3.1 Select input data

The input needs to include specific input data, otherwise it will affect the accuracy of the output result, and keep the number of pictures at least 50.

3.2 Input preprocess

The input data is preprocessed mainly through mean and scale parameters. The formula is:
input_pre = (input - mean) * scale

4. Quantization Output

Two files will be generated in the current directory where the command is executed:

model_quantized.tnnproto　--　Quantified model description file;
model_quantized.tnnmodel　--　Quantified model parameter file;

5. Note

（1）-n and -s parameter only works when the input is a picture；
（2）When the input is a picture，it will be converted to RGB format for processing internally; (3) When the input is txt, the input data storage method is NCHW, and of type float. The storage format stores one data in one line, in total of NCH*W lines. E.g,

0.01
1.1
0.1
255.0
...

(4) scale and mean need to be the value after calculation. For example, 1.0/128.0 is invalid and 0.0078125 is ok.

6. Test Data

Some tests have be done for squeezenet1.1-7.onnx (downloads: https://github.com/onnx/models/blob/master/vision/classification/squeezenet/model/squeezenet1.1-7.onnx) in ImageNet(ILSVRC2012) (downloads: https://image-net.org/challenges/LSVRC/2012/)

The Top-1 accuracy of FP32 is 55.71%.

63 pictures are chosen from data set to be the inputs of quantization. And the result is as follows:

blob_method	weight_method	merge_type	Top-1 Accuracy
2-(KL)	1-(ADMM)	0-(Per-Channel)	51.58%
2-(KL)	1-(ADMM)	2-(Per-Tensor)	50.23%
2-(KL)	1-(ADMM)	1-(Mix)	55.37%
0-(Min-Max)	0-(Min-Max)	0-(Per-Channel)	54.82%

Different configurations can be tried to get the best performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization_en.md

quantization_en.md

Model Quantization

I. Why Quantization

II. Compile

1. Build

2. Output

III. Usage

1. Command

2. Parameter Description

3. Quantization Input

3.1 Select input data

3.2 Input preprocess

4. Quantization Output

5. Note

6. Test Data

Files

quantization_en.md

Latest commit

History

quantization_en.md

File metadata and controls

Model Quantization

I. Why Quantization

II. Compile

1. Build

2. Output

III. Usage

1. Command

2. Parameter Description

3. Quantization Input

3.1 Select input data

3.2 Input preprocess

4. Quantization Output

5. Note

6. Test Data