Autogptq packages for Apple Silicon M3 with GPU. #449

krasaee · 2023-11-24T17:45:42Z

krasaee
Nov 24, 2023

Hi folks, looking to install auto-gptq packages on a conda environment running on a new Apple Macbook Pro M3 Max (128GB). pytorch is working fine, but it looks like we might need dev work on auto-gptq to secure compatibility with Apple Silicon?

Any suggestions on where to begin? I'm willing to help implement this but need details on prior progress, if any before I start to spend countless days diving into the code.

Respectfully,

Kas

TheBloke · 2023-11-25T17:11:18Z

TheBloke
Nov 25, 2023

There is zero prior progress to my knowledge.

AutoGPTQ relies on three types of C/C++ kernels in order to do quantisation:

NVidia CUDA (Windows and Linux)
AMD ROCm (Linux only)
CPU QiGen (Linux only, new and experimental, not used by most people, possibly not fully functional atm)

In order to get macOS support - which I know would be appreciated by a lot of people - you would need to write an equivalent kernel, which I assume would make use of PyTorch's MPS module, which provides GPU acceleration on macOS systems.

I have a lot of experience with using and quantising GPTQs, and I have both an Intel and Silicon (MBP 2023 M2 Ultra) macOS system, and would be glad to help with testing. But I can't help with the coding I'm afraid.

I don't know if you have any experience of using llama.cpp, but they have full GPU accelerated Silicon support. They use a different quantisation method and they don't use PyTorch at all, but it could perhaps be useful for reference, as source of info on GPU accelerated LLM quantisation inference on Silicon from C++.

Let me know if you decide to take on the project!

0 replies

ozak · 2024-03-22T14:57:42Z

ozak
Mar 22, 2024

Hi. Any updates on this?

0 replies

fxmarty · 2024-03-25T09:22:44Z

fxmarty
Mar 25, 2024
Maintainer

Hi @ozak, unfortunately no... Maybe some folks from Apple had a look at mixed precision quantization at https://github.com/ml-explore/mlx ?

0 replies

ozak · 2024-03-25T15:13:04Z

ozak
Mar 25, 2024

The issue is that many models in huggingface use autogpt so I am not sure how that would work (but I am a novice in ML/AI, so I am not sure of anything at this stage...I just wanted to test some models on my M3 chipset). Perhaps @awni or @angeloskath may understand and know more?

5 replies

krasaee Mar 25, 2024
Author

To use GPTQ on Apple Silicon, we need to create a kernel for MPS for AutoGPTQ, which takes time and something I was looking at doing, but at this time I do not have the time to look into this further.

Here is some code snippet that should help you figure out how to use llama (for instance) using GGUF quantization on MPS.

from ctransformers import AutoModelForCausalLM, AutoConfig, Config

llm = AutoModelForCausalLM.from_pretrained(
    model_path_or_repo_id="TheBloke/Llama-2-13B-chat-GGUF",
    model_file="llama-2-13b-chat.Q5_K_M.gguf",
    model_type="llama",
    config=AutoConfig(
        config=Config(
           context_length=4096
        )
    ),
    gpu_layers=50)

prompt = "Give me a vegan dinner recipe."
full_prompt = "A chat between a curious user and an artificial intelligence assistant. "\
              f"The assistant gives detailed answers to the user's questions. USER: {prompt} ASSISTANT:"

response = llm(full_prompt)
print(response)

ozak Mar 25, 2024

So, this is using ctransformers instead of transformers?

Would this work with any model on HF? Is there some way of knowing or just trial and error? E.g., I wanted to try cognitivecomputations/WizardLM-13B-Uncensored or mistralai/Mixtral-8x7B-Instruct-v0.1 or meta-llama/Llama-2-70b-chat-hf? I had tried using these with

from transformers import pipeline
model = "TheBloke/Llama-2-13B-chat-GPTQ"
# model = "TheBloke/Llama-2-7B-chat-GPTQ"

llama_pipe = pipeline("text-generation", model=model, device_map="auto");

and it always failed with the error

PackageNotFoundError: No package metadata was found for auto-gptq

and I could not install auto-gptq on M3.

Any suggestions on how one would approach this? I am still learning so it may be above my current skills but I may be able to play a bit.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autogptq packages for Apple Silicon M3 with GPU. #449

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Autogptq packages for Apple Silicon M3 with GPU. #449

Replies: 4 comments · 5 replies

fxmarty Mar 25, 2024 Maintainer

krasaee Mar 25, 2024 Author

Replies: 4 comments 5 replies

fxmarty
Mar 25, 2024
Maintainer

krasaee Mar 25, 2024
Author