Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying GPU used for quantisation, overriding hardcoded cuda:0 #405

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TheBloke
Copy link
Contributor

@TheBloke TheBloke commented Nov 5, 2023

This simple change adds a new parameter to model.quantize(), called cuda_GPU. Defaulting to 0.

When specified, this overrides the hardcoded cuda:0 which has been in the codebase since the dawn of time.

This will allow me to easily direct multiple GPTQ quantisations to separate GPUs on a multi-GPU system.

Until now I've achieved that by launching a separate script and setting CUDA_VISIBLE_DEVICES=x in the environment when calling that script.

But now I'd like to handle this with an import instead, for various reasons. I initially tried using CUDA_VISIBLE_DEVICES in the module being imported, but I hit a scary PyTorch error when I try this (RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=1, num_gpus=) EDIT: I've now solved that torch error!

I have cumulatively spent about 4 hours trying to debug and fix that, with no success. Then I wondered how hard it'd be to change AutoGPTQ to take in the GPU param instead.. and that took 4 minutes 😢 😁

Anyway, hope this simple change is good enough. There's probably better ways to do it, but this at least has to be better than hardcoding CUDA_0 as has been the case forever.

@TheBloke
Copy link
Contributor Author

TheBloke commented Nov 5, 2023

Isn't this always the way?

5 minutes after making that PR, and after 4 hours 10 minutes total time on this issue, I figured out how to fix it so I can use CUDA_VISIBLE_DEVICES=x like I had been trying to do (on and off) for weeks.

The problem: I had a stray import torch I'd forgotten about, imported from a submodule of a submodule. Because torch was already imported, using CUDA_VISIBLE_DEVICES=x to later limit its GPU completely confused it.

So I don't actually need this PR any more. But I still think it's a good idea - it's not great that quantisation is currently hardcoded to only ever use GPU 0, and it's better if it can use any GPU, at the user's request.

@@ -215,22 +215,25 @@ def quantize(
use_triton: bool = False,
use_cuda_fp16: bool = True,
autotune_warmup_after_quantized: bool = False,
cache_examples_on_gpu: bool = True
cache_examples_on_gpu: bool = True,
cuda_GPU: int = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be:

Suggested change
cuda_GPU: int = 0
device: Union[str, torch.device] = "cuda:0"

?

@Qubitium
Copy link
Contributor

You can set the gpu via CUDA_VISIBLE_DEVICES=N where N is the gpu index on your system. "cuda:0" is actually not a hard code but referencing the first gpu in the torch gpus which reads from CUDA_VISIBLE_DEVICES or all if non is set. It is quite common/normal to have multiple gpu system where CUDA_VISIBLE_DEVICES=4 and inside code have "cuda:0".

So I don't actually need this PR any more. But I still think it's a good idea - it's not great that quantisation is currently hardcoded to only ever use GPU 0, and it's better if it can use any GPU, at the user's request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants