Skip to content

Latest commit

 

History

History
1485 lines (1465 loc) · 59.6 KB

README.md

File metadata and controls

1485 lines (1465 loc) · 59.6 KB

Examples

Intel® Neural Compressor validated examples with multiple compression techniques, including quantization, pruning, knowledge distillation and orchestration. Part of the validated cases can be found in the example tables, and the release data is available here.

Quick Get Started Notebook Examples

Helloworld Examples

  • torch_llm: apply the weight-only quantization to LLMs.
  • torch_non_llm: apply the static quantization to non-LLMs.
  • tf_example1: quantize with built-in dataloader and metric.
  • tf_example2: quantize keras model with customized metric and dataloader.
  • tf_example3: convert model with mix precision.
  • tf_example4: quantize checkpoint with dummy dataloader.
  • tf_example5: config performance and accuracy measurement.
  • tf_example6: use default user-facing APIs to quantize a pb model.
  • tf_example7: quantize and benchmark with pure python API.

TensorFlow Examples

Quantization

Model Domain Approach Examples
ResNet50 V1.0 Image Recognition Post-Training Static Quantization pb
ResNet50 V1.5 Image Recognition Post-Training Static Quantization pb / keras
ResNet101 Image Recognition Post-Training Static Quantization pb / keras
MobileNet V1 Image Recognition Post-Training Static Quantization pb
MobileNet V2 Image Recognition Post-Training Static Quantization pb / keras
MobileNet V3 Image Recognition Post-Training Static Quantization pb
Inception V1 Image Recognition Post-Training Static Quantization pb
Inception V2 Image Recognition Post-Training Static Quantization pb
Inception V3 Image Recognition Post-Training Static Quantization pb / keras
Inception V4 Image Recognition Post-Training Static Quantization pb
Inception ResNet V2 Image Recognition Post-Training Static Quantization pb / keras
VGG16 Image Recognition Post-Training Static Quantization pb / keras
VGG19 Image Recognition Post-Training Static Quantization pb / keras
ResNet V2 50 Image Recognition Post-Training Static Quantization pb / keras
ResNet V2 101 Image Recognition Post-Training Static Quantization pb / keras
ResNet V2 152 Image Recognition Post-Training Static Quantization pb
DenseNet121 Image Recognition Post-Training Static Quantization pb
DenseNet161 Image Recognition Post-Training Static Quantization pb
DenseNet169 Image Recognition Post-Training Static Quantization pb
EfficientNet B0 Image Recognition Post-Training Static Quantization ckpt
Xception Image Recognition Post-Training Static Quantization keras
ResNet V2 Image Recognition Quantization-Aware Training keras
EfficientNet V2 B0 Image Recognition Post-Training Static Quantization SavedModel
BERT base MRPC Natural Language Processing Post-Training Static Quantization ckpt
BERT large SQuAD (Model Zoo) Natural Language Processing Post-Training Static Quantization pb
BERT large SQuAD Natural Language Processing Post-Training Static Quantization pb
DistilBERT base Natural Language Processing Post-Training Static Quantization pb
Transformer LT Natural Language Processing Post-Training Static Quantization pb
Transformer LT MLPerf Natural Language Processing Post-Training Static Quantization pb
SSD ResNet50 V1 Object Detection Post-Training Static Quantization pb / ckpt
SSD MobileNet V1 Object Detection Post-Training Static Quantization pb / ckpt
Faster R-CNN Inception ResNet V2 Object Detection Post-Training Static Quantization pb / SavedModel
Faster R-CNN ResNet101 Object Detection Post-Training Static Quantization pb / SavedModel
Faster R-CNN ResNet50 Object Detection Post-Training Static Quantization pb
Mask R-CNN Inception V2 Object Detection Post-Training Static Quantization pb / ckpt
SSD ResNet34 Object Detection Post-Training Static Quantization pb
YOLOv3 Object Detection Post-Training Static Quantization pb
Wide & Deep Recommendation Post-Training Static Quantization pb
Arbitrary Style Transfer Style Transfer Post-Training Static Quantization ckpt
OPT Natural Language Processing Post-Training Static Quantization pb (smooth quant)
GPT2 Natural Language Processing Post-Training Static Quantization pb (smooth quant)
ViT Image Recognition Post-Training Static Quantization pb
GraphSage Graph Networks Post-Training Static Quantization pb
EleutherAI/gpt-j-6B Natural Language Processing Post-Training Static Quantization saved_model (smooth quant)

Distillation

Student Model Teacher Model Domain Approach Examples
MobileNet DenseNet201 Image Recognition Knowledge Distillation pb

Pruning

Model Domain Approach Examples
ResNet V2 Image Recognition Structured (4x1, 2in4) keras
ViT Image Recognition Structured (4x1, 2in4) keras

Model Export

Model Domain Approach Examples
ResNet50 V1.5 Image Recognition TF2ONNX int8 fp32

PyTorch Examples

Quantization

Model Domain Approach Examples
ResNet18 Image Recognition Post-Training Static Quantization fx / ipex
ResNet18 Image Recognition Quantization-Aware Training fx
ResNet50 Image Recognition Post-Training Static Quantization fx / ipex
ResNet50 Image Recognition Quantization-Aware Training fx
ResNeXt101_32x16d_wsl Image Recognition Post-Training Static Quantization ipex
ResNeXt101_32x8d Image Recognition Post-Training Static Quantization fx
Se_ResNeXt50_32x4d Image Recognition Post-Training Static Quantization fx
Inception V3 Image Recognition Post-Training Static Quantization fx
MobileNet V2 Image Recognition Post-Training Static Quantization fx
PeleeNet Image Recognition Post-Training Static Quantization fx
ResNeSt50 Image Recognition Post-Training Static Quantization fx
3D-UNet Image Recognition Post-Training Static Quantization fx
SSD ResNet34 Object Detection Post-Training Static Quantization fx / ipex
YOLOv3 Object Detection Post-Training Static Quantization fx
Mask R-CNN Object Detection Post-Training Static Quantization fx
DLRM Recommendation Post-Training Static Quantization ipex / fx
HuBERT Speech Recognition Post-Training Static Quantization fx
HuBERT Speech Recognition Post-Training Dynamic Quantization fx
RNNT Speech Recognition Post-Training Dynamic Quantization fx
BlendCNN Natural Language Processing Post-Training Static Quantization ipex
bert-large-uncased-whole-word-masking-finetuned-squad Natural Language Processing Post-Training Static Quantization fx / ipex(xpu)
distilbert-base-uncased-distilled-squad Natural Language Processing Post-Training Static Quantization ipex
yoshitomo-matsubara/bert-large-uncased-rte Natural Language Processing Post-Training Dynamic Quantization fx
Intel/xlm-roberta-base-mrpc Natural Language Processing Post-Training Dynamic Quantization fx
textattack/distilbert-base-uncased-MRPC Natural Language Processing Post-Training Dynamic Quantization fx
textattack/albert-base-v2-MRPC Natural Language Processing Post-Training Dynamic Quantization fx
Intel/xlm-roberta-base-mrpc Natural Language Processing Post-Training Static Quantization fx
yoshitomo-matsubara/bert-large-uncased-rte Natural Language Processing Post-Training Static Quantization fx
Intel/bert-base-uncased-mrpc Natural Language Processing Post-Training Static Quantization fx
textattack/bert-base-uncased-CoLA Natural Language Processing Post-Training Static Quantization fx
textattack/bert-base-uncased-STS-B Natural Language Processing Post-Training Static Quantization fx
gchhablani/bert-base-cased-finetuned-sst2 Natural Language Processing Post-Training Static Quantization fx
ModelTC/bert-base-uncased-rte Natural Language Processing Post-Training Static Quantization fx
textattack/bert-base-uncased-QNLI Natural Language Processing Post-Training Static Quantization fx
yoshitomo-matsubara/bert-large-uncased-cola Natural Language Processing Post-Training Static Quantization fx
textattack/distilbert-base-uncased-MRPC Natural Language Processing Post-Training Static Quantization fx
Intel/xlnet-base-cased-mrpc Natural Language Processing Post-Training Static Quantization fx
textattack/roberta-base-MRPC Natural Language Processing Post-Training Static Quantization fx
Intel/camembert-base-mrpc Natural Language Processing Post-Training Static Quantization fx
t5-small Natural Language Processing Post-Training Dynamic Quantization fx
Helsinki-NLP/opus-mt-en-ro Natural Language Processing Post-Training Dynamic Quantization fx
lvwerra/pegasus-samsum Natural Language Processing Post-Training Dynamic Quantization fx
google/reformer-crime-and-punishment Natural Language Processing Post-Training Static Quantization fx
EleutherAI/gpt-j-6B Natural Language Processing Post-Training Static Quantization fx / smooth quant
EleutherAI/gpt-j-6B Natural Language Processing Post-Training Weight Only Quantization weight_only
abeja/gpt-neox-japanese-2.7b Natural Language Processing Post-Training Static Quantization fx
bigscience/bloom Natural Language Processing Post-Training Static Quantization smooth quant
facebook/opt Natural Language Processing Post-Training Static Quantization smooth quant
SD Diffusion Text to Image Post-Training Static Quantization fx
openai/whisper-large Speech Recognition Post-Training Dynamic Quantization fx
torchaudio/wav2vec2 Speech Recognition Post-Training Dynamic Quantization fx

Quantization with Intel® Extension for Transformers based on Intel® Neural Compressor

Model Domain Approach Examples
T5 Large Natural Language Processing Post-Training Dynamic Quantization fx
Flan T5 Large Natural Language Processing Post-Training Dynamic / Static Quantization fx

Pruning

Model Domain Pruning Type Approach Examples
Distilbert-base-uncased Natural Language Processing (text classification) Structured (4x1, 2in4), Unstructured Snip-momentum eager
Bert-mini Natural Language Processing (text classification) Structured (4x1, 2in4, per channel), Unstructured Snip-momentum eager
Distilbert-base-uncased Natural Language Processing (question answering) Structured (4x1, 2in4), Unstructured Snip-momentum eager
Bert-mini Natural Language Processing (question answering) Structured (4x1, 2in4), Unstructured Snip-momentum eager
Bert-base-uncased Natural Language Processing (question answering) Structured (4x1, 2in4), Unstructured Snip-momentum eager
Bert-large Natural Language Processing (question answering) Structured (4x1, 2in4), Unstructured Snip-momentum eager
Flan-T5-small Natural Language Processing (translation) Structured (4x1) Snip-momentum eager
YOLOv5s6 Object Detection Structured (4x1, 2in4), Unstructured Snip-momentum eager
ResNet50 Image Recognition Structured (2x1) Snip-momentum eager
Bert-base Question Answering Structured (channel, multi-head attention) Snip-momentum eager
Bert-large Question Answering Structured (channel, multi-head attention) Snip-momentum eager

Distillation

Student Model Teacher Model Domain Approach Examples
CNN-2 CNN-10 Image Recognition Knowledge Distillation eager
MobileNet V2-0.35 WideResNet40-2 Image Recognition Knowledge Distillation eager
ResNet18|ResNet34|ResNet50|ResNet101 ResNet18|ResNet34|ResNet50|ResNet101 Image Recognition Knowledge Distillation eager
ResNet18|ResNet34|ResNet50|ResNet101 ResNet18|ResNet34|ResNet50|ResNet101 Image Recognition Self Distillation eager
VGG-8 VGG-13 Image Recognition Knowledge Distillation eager
BlendCNN BERT-Base Natural Language Processing Knowledge Distillation eager
DistilBERT BERT-Base Natural Language Processing Knowledge Distillation eager
BiLSTM RoBERTa-Base Natural Language Processing Knowledge Distillation eager
TinyBERT BERT-Base Natural Language Processing Knowledge Distillation eager
BERT-3 BERT-Base Natural Language Processing Knowledge Distillation eager
DistilRoBERTa RoBERTa-Large Natural Language Processing Knowledge Distillation eager

Orchestration

Model Domain Approach Examples
ResNet50 Image Recognition Multi-shot: Pruning and PTQ
link
ResNet50 Image Recognition One-shot: QAT during Pruning
link
Intel/bert-base-uncased-sparse-90-unstructured-pruneofa Natural Language Processing (question-answering) One-shot: Pruning, Distillation and QAT
link
Intel/bert-base-uncased-sparse-90-unstructured-pruneofa Natural Language Processing (text-classification) One-shot: Pruning, Distillation and QAT
link

Model Export

Model Domain Approach Examples
ResNet18 Image Recognition PT2ONNX int8 fp32
ResNet50 Image Recognition PT2ONNX int8 fp32
bert base MRPC Natural Language Processing PT2ONNX int8 fp32
bert large MRPC Natural Language Processing PT2ONNX int8 fp32

ONNX Runtime Examples

Quantization

Model Domain Approach Examples
ResNet50 V1.5 Image Recognition Post-Training Static Quantization qlinearops / qdq
ResNet50 V1.5 MLPerf Image Recognition Post-Training Static Quantization qlinearops / qdq
VGG16 Image Recognition Post-Training Static Quantization qlinearops / qdq
MobileNet V2 Image Recognition Post-Training Static Quantization qlinearops / qdq
MobileNet V3 MLPerf Image Recognition Post-Training Static Quantization qlinearops / qdq
AlexNet (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
CaffeNet (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
DenseNet (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops
EfficientNet (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
FCN (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
GoogleNet (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
Inception V1 (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
MNIST (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops
MobileNet V2 (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
ResNet50 V1.5 (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
ShuffleNet V2 (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
SqueezeNet (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
VGG16 (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
ZFNet (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops / qdq
ArcFace (ONNX Model Zoo) Image Recognition Post-Training Static Quantization qlinearops
BEiT Image Recognition Post-Training Static Quantization qlinearops
CodeBert Natural Language Processing Post-Training Static Quantization qlinearops
CodeBert Natural Language Processing Post-Training Dynamic Quantization integerops
BERT base MRPC Natural Language Processing Post-Training Static Quantization integerops / qdq
BERT base MRPC Natural Language Processing Post-Training Dynamic Quantization integerops
DistilBERT base MRPC Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qdq
Mobile bert MRPC Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qdq
Roberta base MRPC Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qdq
BERT SQuAD (ONNX Model Zoo) Natural Language Processing Post-Training Dynamic Quantization integerops
GPT2 lm head WikiText (ONNX Model Zoo) Natural Language Processing Post-Training Dynamic Quantization integerops
MobileBERT SQuAD MLPerf (ONNX Model Zoo) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qdq
BiDAF (ONNX Model Zoo) Natural Language Processing Post-Training Dynamic Quantization integerops
BERT base uncased MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
Roberta base MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
XLM Roberta base MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
Camembert base MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
MiniLM L12 H384 uncased MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
DistilBERT base uncased SST-2 (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
Albert base v2 SST-2 (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
MiniLM L6 H384 uncased SST-2 (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
BERT base cased MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
Electra small discriminator MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
BERT mini MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
Xlnet base cased MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
BART large MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
DeBERTa v3 base MRPC (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
Spanbert SQuAD (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
Bert base multilingual cased SQuAD (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
DistilBert base uncased SQuAD (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
BERT large uncased whole word masking SQuAD (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
Roberta large SQuAD v2 (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
GPT2 WikiText (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
DistilGPT2 WikiText (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
LayoutLMv3 FUNSD (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
LayoutLMv2 FUNSD (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
LayoutLM FUNSD (HuggingFace) Natural Language Processing Post-Training Dynamic / Static Quantization integerops / qlinearops
SSD MobileNet V1 Object Detection Post-Training Static Quantization qlinearops / qdq
SSD MobileNet V2 Object Detection Post-Training Static Quantization qlinearops / qdq
Table Transformer Structure Recognition Object Detection Post-Training Static Quantization qlinearops
Table Transformer Detection Object Detection Post-Training Static Quantization qlinearops
SSD MobileNet V1 (ONNX Model Zoo) Object Detection Post-Training Static Quantization qlinearops / qdq
DUC (ONNX Model Zoo) Object Detection Post-Training Static Quantization qlinearops
Faster R-CNN (ONNX Model Zoo) Object Detection Post-Training Static Quantization qlinearops / qdq
Mask R-CNN (ONNX Model Zoo) Object Detection Post-Training Static Quantization qlinearops / qdq
SSD (ONNX Model Zoo) Object Detection Post-Training Static Quantization qlinearops / qdq
Tiny YOLOv3 (ONNX Model Zoo) Object Detection Post-Training Static Quantization qlinearops
YOLOv3 (ONNX Model Zoo) Object Detection Post-Training Static Quantization qlinearops
YOLOv4 (ONNX Model Zoo) Object Detection Post-Training Static Quantization qlinearops
Emotion FERPlus (ONNX Model Zoo) Body Analysis Post-Training Static Quantization qlinearops
Ultra Face (ONNX Model Zoo) Body Analysis Post-Training Static Quantization qlinearops
GPT-J-6B (HuggingFace) Text Generation Post-Training Dynamic / Static Quantization integerops / qlinearops
Llama-7B (HuggingFace) Text Generation Static / Weight Only Quantization qlinearops / weight_only

Notebook Examples