Neural Network Compression Framework for enhanced OpenVINO™ inference
-
Updated
May 23, 2024 - Python
Neural Network Compression Framework for enhanced OpenVINO™ inference
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
A Tutorial Notebook to Quantization in Machine Learning
Implementation of MedQ: Lossless ultra-low-bit neural network quantization for medical image segmentation
Quantization notebooks (adapted from and for Mobile Apps w/ Machine Learning, By Dara Varam and Lujain Khalil)
EfficientNetV2 (Efficientnetv2-b2) and quantization int8 and fp32 (QAT and PTQ) on CK+ dataset . fine-tuning, augmentation, solving imbalanced dataset, etc.
Tutorial notebooks for hls4ml
A lightweight Convolutional Autoencoder for recognizing Bangla font styles along with quantization for deploying resource-constrained IoT devices.
Training neural nets with quantized weights on arbitrarily specified bit-depth
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
Quantization simulation of neural networks with PyTorch
Quantization Aware Training
Quantization Aware Training
A model compression and acceleration toolbox based on pytorch.
Classify alcohols and its snacks
Notes on quantization in neural networks
Disentangle joint continous and discrete representations for Anomaly Detection in High Energy Physics.
Comprehensive study on the quantization of various CNN models, employing techniques such as Post-Training Quantization and Quantization Aware Training (QAT).
Add a description, image, and links to the quantization-aware-training topic page so that developers can more easily learn about it.
To associate your repository with the quantization-aware-training topic, visit your repo's landing page and select "manage topics."