AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
-
Updated
May 13, 2024 - Python
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Fast inference engine for Transformer models
Unify Efficient Fine-Tuning of 100+ LLMs
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
Neural Network Compression Framework for enhanced OpenVINO™ inference
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Official implementation of Half-Quadratic Quantization (HQQ)
Faster Whisper transcription with CTranslate2
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
This is the official implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and it is also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
[CVPR 2024 Highlight] TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
KGy SOFT Drawing is a library for advanced image, icon and graphics handling.
Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
a friendly neighborhood repository with diverse experiments and adventures in the world of LLMs
Learn to compressing models through methods such as quantization to make them more efficient, faster, and accessible
Documentation of my notes, learnings, presentations on Computer vision and some other cool stuff
GRAG is a simple python package that provides an easy end-to-end solution for implementing Retrieval Augmented Generation (RAG). The package offers an easy way for running various LLMs locally, Thanks to LlamaCpp and also supports vector stores like Chroma and DeepLake.
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."