Allow specifying GPU used for quantisation, overriding hardcoded cuda:0 #405
+12
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This simple change adds a new parameter to
model.quantize()
, calledcuda_GPU
. Defaulting to 0.When specified, this overrides the hardcoded
cuda:0
which has been in the codebase since the dawn of time.This will allow me to easily direct multiple GPTQ quantisations to separate GPUs on a multi-GPU system.
Until now I've achieved that by launching a separate script and setting
CUDA_VISIBLE_DEVICES=x
in the environment when calling that script.But now I'd like to handle this with an
import
instead, for various reasons. I initially tried usingCUDA_VISIBLE_DEVICES
in the module being imported, but I hit a scary PyTorch error when I try this (RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=1, num_gpus=
) EDIT: I've now solved that torch error!I have cumulatively spent about 4 hours trying to debug and fix that, with no success. Then I wondered how hard it'd be to change AutoGPTQ to take in the GPU param instead.. and that took 4 minutes 😢 😁
Anyway, hope this simple change is good enough. There's probably better ways to do it, but this at least has to be better than hardcoding
CUDA_0
as has been the case forever.