[FEATURE] Support BitBLAS Backend for QuantLinear #662

LeiWang1999 · 2024-05-01T18:38:40Z

Hello everyone, we have recently published our library that supports mixed-precision BLAS operations on GPUs. This is the pull request to integrate it as an inference kernel in quantized linear with AutoGPTQ.

Hope this can help us to explore something fascinating.

The repo for BitBLAS is: https://github.com/microsoft/BitBLAS

And our benchmark results:

End2End Integration with Quantize Inference Kernel for AutoGPTQ and vLLM.

LeiWang1999 · 2024-05-02T04:29:09Z

CC @PanQiWei @qwopqwop200

qwopqwop200 · 2024-05-02T09:52:55Z

Currently, this project is mainly managed by fxmarty.
@fxmarty

LeiWang1999 · 2024-05-03T13:16:00Z

Thanks @Qubitium , Please cc @fxmarty

LeiWang1999 added 8 commits March 10, 2024 22:38

support bitblas.

605b114

Add use_bitblas flag to load_model_tokenizer function and main function

9fddb8e

Fix bitblas backend initialization and matrix multiplication

30e6cc0

Remove print statement and save quantized model

7505e60

Fix bitblas backend initialization and matrix multiplication

08a269b

Add pytest.ini and MANIFEST.in files, and update GPTQ module imports

205f74a

BitBLAS Support

29a9b70

revert example

cf538d3

LeiWang1999 changed the title ~~[FEATURE] Support BitBLAS Backend for low precision kernel~~ [FEATURE] Support BitBLAS Backend for QuantLinear May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support BitBLAS Backend for QuantLinear #662

[FEATURE] Support BitBLAS Backend for QuantLinear #662

LeiWang1999 commented May 1, 2024

LeiWang1999 commented May 2, 2024

qwopqwop200 commented May 2, 2024

LeiWang1999 commented May 3, 2024

[FEATURE] Support BitBLAS Backend for QuantLinear #662

Are you sure you want to change the base?

[FEATURE] Support BitBLAS Backend for QuantLinear #662

Conversation

LeiWang1999 commented May 1, 2024

LeiWang1999 commented May 2, 2024

qwopqwop200 commented May 2, 2024

LeiWang1999 commented May 3, 2024