Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support BitBLAS Backend for QuantLinear #662

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

LeiWang1999
Copy link
Contributor

Hello everyone, we have recently published our library that supports mixed-precision BLAS operations on GPUs. This is the pull request to integrate it as an inference kernel in quantized linear with AutoGPTQ.

Hope this can help us to explore something fascinating.

The repo for BitBLAS is: https://github.com/microsoft/BitBLAS

And our benchmark results:

  • End2End Integration with Quantize Inference Kernel for AutoGPTQ and vLLM.

@LeiWang1999 LeiWang1999 changed the title [FEATURE] Support BitBLAS Backend for low precision kernel [FEATURE] Support BitBLAS Backend for QuantLinear May 1, 2024
@LeiWang1999
Copy link
Contributor Author

CC @PanQiWei @qwopqwop200

@qwopqwop200
Copy link
Collaborator

Currently, this project is mainly managed by fxmarty.
@fxmarty

@LeiWang1999
Copy link
Contributor Author

Thanks @Qubitium , Please cc @fxmarty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants