Allow specifying GPU used for quantisation, overriding hardcoded cuda:0 #405

TheBloke · 2023-11-05T02:37:16Z

This simple change adds a new parameter to model.quantize(), called cuda_GPU. Defaulting to 0.

When specified, this overrides the hardcoded cuda:0 which has been in the codebase since the dawn of time.

This will allow me to easily direct multiple GPTQ quantisations to separate GPUs on a multi-GPU system.

Until now I've achieved that by launching a separate script and setting CUDA_VISIBLE_DEVICES=x in the environment when calling that script.

But now I'd like to handle this with an import instead, for various reasons. I initially tried using CUDA_VISIBLE_DEVICES in the module being imported, but I hit a scary PyTorch error when I try this (RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=1, num_gpus=) EDIT: I've now solved that torch error!

I have cumulatively spent about 4 hours trying to debug and fix that, with no success. Then I wondered how hard it'd be to change AutoGPTQ to take in the GPU param instead.. and that took 4 minutes 😢 😁

Anyway, hope this simple change is good enough. There's probably better ways to do it, but this at least has to be better than hardcoding CUDA_0 as has been the case forever.

TheBloke · 2023-11-05T16:11:19Z

Isn't this always the way?

5 minutes after making that PR, and after 4 hours 10 minutes total time on this issue, I figured out how to fix it so I can use CUDA_VISIBLE_DEVICES=x like I had been trying to do (on and off) for weeks.

The problem: I had a stray import torch I'd forgotten about, imported from a submodule of a submodule. Because torch was already imported, using CUDA_VISIBLE_DEVICES=x to later limit its GPU completely confused it.

So I don't actually need this PR any more. But I still think it's a good idea - it's not great that quantisation is currently hardcoded to only ever use GPU 0, and it's better if it can use any GPU, at the user's request.

fxmarty · 2023-11-07T16:18:58Z

auto_gptq/modeling/_base.py

@@ -215,22 +215,25 @@ def quantize(
        use_triton: bool = False,
        use_cuda_fp16: bool = True,
        autotune_warmup_after_quantized: bool = False,
-        cache_examples_on_gpu: bool = True
+        cache_examples_on_gpu: bool = True,
+        cuda_GPU: int = 0


should this be:

Suggested change

cuda_GPU: int = 0

device: Union[str, torch.device] = "cuda:0"

?

Qubitium · 2024-03-18T07:15:25Z

You can set the gpu via CUDA_VISIBLE_DEVICES=N where N is the gpu index on your system. "cuda:0" is actually not a hard code but referencing the first gpu in the torch gpus which reads from CUDA_VISIBLE_DEVICES or all if non is set. It is quite common/normal to have multiple gpu system where CUDA_VISIBLE_DEVICES=4 and inside code have "cuda:0".

So I don't actually need this PR any more. But I still think it's a good idea - it's not great that quantisation is currently hardcoded to only ever use GPU 0, and it's better if it can use any GPU, at the user's request.

Allow specifying GPU used for quantisation, overriding hardcoded cuda:0

dec06fd

fxmarty reviewed Nov 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow specifying GPU used for quantisation, overriding hardcoded cuda:0 #405

Allow specifying GPU used for quantisation, overriding hardcoded cuda:0 #405

TheBloke commented Nov 5, 2023 •

edited

TheBloke commented Nov 5, 2023

fxmarty Nov 7, 2023

Qubitium commented Mar 18, 2024

	cuda_GPU: int = 0
	device: Union[str, torch.device] = "cuda:0"

Allow specifying GPU used for quantisation, overriding hardcoded cuda:0 #405

Are you sure you want to change the base?

Allow specifying GPU used for quantisation, overriding hardcoded cuda:0 #405

Conversation

TheBloke commented Nov 5, 2023 • edited

TheBloke commented Nov 5, 2023

fxmarty Nov 7, 2023

Choose a reason for hiding this comment

Qubitium commented Mar 18, 2024

TheBloke commented Nov 5, 2023 •

edited