nnieqat-pytorch

Nnieqat is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module's weight and activation as fake fp32 format.

Installation

Supported Platforms: Linux
Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1 or 10.2.
Dependencies:
- python >= 3.5, < 4
- llvmlite >= 0.31.0
- pytorch >= 1.5
- numba >= 0.42.0
- numpy >= 1.18.1
Install nnieqat via pypi:
```
$ pip install nnieqat
```
Install nnieqat in docker(easy way to solve environment problems)：
```
$ cd docker
$ docker build -t nnieqat-image .
```

Install nnieqat via repo：

$ git clone https://github.com/aovoc/nnieqat-pytorch
$ cd nnieqat-pytorch
$ make install

Usage

add quantization hook.

quantize and dequantize weight and data with HiSVP GFPQ library in forward() process.

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
  register_quantization_hook(model)
...

merge bn weight into conv and freeze bn

suggest finetuning from a well-trained model, merge_freeze_bn at beginning. do it after a few epochs of training otherwise.

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.train()
    model = merge_freeze_bn(model)  #it will change bn to eval() mode during training
...

Unquantize weight before update it

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.apply(unquant_weight)  # using original weight while updating
    optimizer.step()
...

Dump weight optimized model

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.apply(quant_dequant_weight)
    save_checkpoint(...)
    model.apply(unquant_weight)
...

Using EMA with caution(Not recommended).

Code Examples

Cifar10 quantization aware training example (add nnieqat into pytorch_cifar10_tutorial)

python test/test_cifar10.py
ImageNet quantization finetuning example (add nnieqat into pytorh_imagenet_main.py)

python test/test_imagenet.py --pretrained path_to_imagenet_dataset

Results

ImageNet

python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.001 --pretrained --epoch 10   # nnie_lr_e-3_ft
python pytorh_imagenet_main.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # lr_e-4_ft
python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # nnie_lr_e-4_ft

finetune result：

	trt_fp32	trt_int8	nnie
torchvision	0.56992	0.56424	0.56026
nnie_lr_e-3_ft	0.56600	0.56328	0.56612
lr_e-4_ft	0.57884	0.57502	0.57542
nnie_lr_e-4_ft	0.57834	0.57524	0.57730

coco

net: simplified yolov5s

train 300 epoches, hi3559 test result:

finetune 20 epoches, hi3559 test result:

Todo

Generate quantized model directly.

Reference

HiSVP 量化库使用指南

Quantizing deep convolutional networks for efficient inference: A whitepaper

8-bit Inference with TensorRT

Distilling the Knowledge in a Neural Network

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docker		docker
docs		docs
nnieqat		nnieqat
src		src
tests		tests
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
build_helper.py		build_helper.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

License

aovoc/nnieqat-pytorch

Folders and files

Latest commit

History

Repository files navigation

nnieqat-pytorch

Table of Contents

Installation

Usage

Code Examples

Results

Todo

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages