New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BitsNBytes 4 bit quantization error message typo and logical errors in error message handling #30751
Comments
torch.cuda.is_available()
error in newest version of torch inconsistently results in misleading error message when using bits and bytes quantizationtorch.cuda.is_available()
call error results in highly misleading error message when using bits and bytes quantization on a system without cuda
It seems like you are facing an issue with running the code snippet provided in a system where CUDA is not available, resulting in an import error related to torch.cuda. The issue stems from the logic within the Transformers library that checks for CUDA availability before enabling BitsAndBytes functionality. Here are some steps and potential solutions to address this problem:
By exploring these solutions and seeking guidance from the Transformers community, you can work towards resolving the import error caused by the CUDA availability check when running the provided code snippet in systems without NVIDIA GPUs or CUDA support. |
torch.cuda.is_available()
call error results in highly misleading error message when using bits and bytes quantization on a system without cuda
Hi @jkterry1, thanks for this detailed report ! For 3. and 4. , let me know if you want to submit a PR to fix the logger message and split into two checks ! Otherwise, I can do it ! For 1. and 5., this is indeed strange that the first cuda check passes but failed in the second check in bitsandbytes. We can pontentially remove the cuda import check in |
Thank you so much! If you'd be willing to do PRs yourself for 3 and 4, I'd be extraordinarily grateful (also note that it's likely that you'll need to make the fix in 3 to the 8 bit version of this file). Regarding options 1 and 4, I personally believe that you should remove the cuda check from is_bitsandbytes_available so that when the function returns false it's only for the expected reason, and perform all cuda availability checks outside of this to prevent future unexpected errors. Additionally, I think it would likely be prudent to verify that the code snippet to check for cuda described in 4 that threw a false positive for me and started me down this journey correctly throws a negative in test environments without GPUs, because it threw a false positive in the docker image I described as well as a random Arch linux laptop, suggesting something very unlikely happened to me or there's something wrong in the logic. |
System Info
Newest version of transformers, accelerate, bitsandbytes in a docker container (nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04), Ubuntu 20.04 system, Arch Linux laptop
Who can help?
@SunMarc and @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Impacts other models when using BNB 4 bit, e.g. meta-llama/Llama-2-7b-chat-hf
Expected behavior
On a system where cuda isn't working (e.g. a laptop with no Nvidia GPU or a container that wasn't properly connected to the host OS driver), running that code snippet gives an import error:
The logic at issue starts here:
transformers/src/transformers/quantizers/quantizer_bnb_4bit.py
Line 60 in e0c3cee
if not torch.cuda.is_available():
error message seemingly is frequently bypassed on multiple systems with no torch cuda functionality, passing the system state ontoif not (is_accelerate_available() and is_bitsandbytes_available()):
is_bitsandbytes_available())
calls out totransformers/src/transformers/utils/import_utils.py
Line 749 in e0c3cee
is_accelerate_available()
calls out totransformers/src/transformers/utils/import_utils.py
Line 819 in e0c3cee
if not torch.cuda.is_available():
function that's built into theis_bitsandbytes_available())
fails, resulting in getting a wildly confusing error message of "Usingbitsandbytes
8-bit quantization requires Accelerate:pip install accelerate
and the latest version of bitsandbytes:pip install -i https://pypi.org/simple/ bitsandbytes
" when accelerate and bitsandbytes are both installed.and
is tthe incorrect logical operator to use intransformers/src/transformers/quantizers/quantizer_bnb_4bit.py
Line 63 in e0c3cee
bitsandbytes
8-bit quantization" in the 4-bit version of the filetransformers/src/transformers/quantizers/quantizer_bnb_4bit.py
Line 29 in e0c3cee
Additionally, having
is_bitsandbytes_available())
calltorch.cuda.is_available()
seems like a non obvious and non modular design choice that's likely to result in similar very misleading and hard to debug error messages in the future.I think that the underlying logical issues here also likely resulted in this other github sub-issue: #29177 (comment)
The text was updated successfully, but these errors were encountered: