BitsNBytes 4 bit quantization error message typo and logical errors in error message handling #30751

jkterry1 · 2024-05-10T23:03:18Z

System Info

Newest version of transformers, accelerate, bitsandbytes in a docker container (nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04), Ubuntu 20.04 system, Arch Linux laptop

Who can help?

@SunMarc and @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import RobertaForSequenceClassification, RobertaTokenizer
eval_model_path = "hubert233/GPTFuzz"
tokenizer = RobertaTokenizer.from_pretrained(eval_model_path)
eval_model = RobertaForSequenceClassification.from_pretrained(
    eval_model_path,
    low_cpu_mem_usage=True,
    device_map="auto",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16
    ),
)
eval_model.eval()

Impacts other models when using BNB 4 bit, e.g. meta-llama/Llama-2-7b-chat-hf

Expected behavior

On a system where cuda isn't working (e.g. a laptop with no Nvidia GPU or a container that wasn't properly connected to the host OS driver), running that code snippet gives an import error:

The logic at issue starts here:

transformers/src/transformers/quantizers/quantizer_bnb_4bit.py

Line 60 in e0c3cee

def validate_environment(self, *args, **kwargs):

The if not torch.cuda.is_available(): error message seemingly is frequently bypassed on multiple systems with no torch cuda functionality, passing the system state onto if not (is_accelerate_available() and is_bitsandbytes_available()):
is_bitsandbytes_available()) calls out to

transformers/src/transformers/utils/import_utils.py

Line 749 in e0c3cee

def is_bitsandbytes_available():

and is_accelerate_available() calls out to

transformers/src/transformers/utils/import_utils.py

Line 819 in e0c3cee

def is_accelerate_available(min_version: str = ACCELERATE_MIN_VERSION):

. In the instance where the first is check is bypassed, the if not torch.cuda.is_available(): function that's built into the is_bitsandbytes_available()) fails, resulting in getting a wildly confusing error message of "Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes" when accelerate and bitsandbytes are both installed.
Ignoring the issue in the torch call that resulted in my getting into this in the first place, I believe that and is tthe incorrect logical operator to use in

transformers/src/transformers/quantizers/quantizer_bnb_4bit.py

Line 63 in e0c3cee

if not (is_accelerate_available() and is_bitsandbytes_available()):

, because the error message says the problem is in accelerate not being present while bitsandbytes is
The error message says "Using bitsandbytes 8-bit quantization" in the 4-bit version of the file
I suspect that the original issue with the torch cuda availability check being erroneously passed is caused by the logic in this line:

transformers/src/transformers/quantizers/quantizer_bnb_4bit.py

Line 29 in e0c3cee

if is_torch_available():

Additionally, having is_bitsandbytes_available()) call torch.cuda.is_available() seems like a non obvious and non modular design choice that's likely to result in similar very misleading and hard to debug error messages in the future.

I think that the underlying logical issues here also likely resulted in this other github sub-issue: #29177 (comment)

The text was updated successfully, but these errors were encountered:

cw235 · 2024-05-10T23:19:14Z

It seems like you are facing an issue with running the code snippet provided in a system where CUDA is not available, resulting in an import error related to torch.cuda. The issue stems from the logic within the Transformers library that checks for CUDA availability before enabling BitsAndBytes functionality.

Here are some steps and potential solutions to address this problem:

Identifying the Issue:
- The code snippet includes logic that checks for the availability of CUDA before enabling BitsAndBytes functionality. This check is necessary to ensure that CUDA operations can be used efficiently. However, in systems without NVIDIA GPUs or proper CUDA configurations, this check can cause import errors.
Solutions:
- Option 1: Error Handling:
  - Modify the code to handle the case where CUDA is not available gracefully. You can add conditional logic to bypass the CUDA check if the system does not support it.
- Option 2: Environment Configuration:
  - Ensure that your Docker container is correctly configured to access the host system's CUDA driver. This configuration is crucial for running CUDA-dependent operations within the container.
- Option 3: Alternative Device Configuration:
  - Consider specifying a different device for model computation, such as CPU, when CUDA is not available. This approach can prevent errors related to CUDA dependencies.
Further Investigation:
- Review the logic within the Quantizer module of the Transformers library that handles CUDA availability checks. Understanding this logic may provide insights into why the conditional checks are not working as expected in your system configuration.
Community Support:
- Reach out to the Transformers library maintainers, such as @SunMarc and @younesbelkada, for assistance with debugging the specific issue related to CUDA availability checks and BitsAndBytes functionality.

By exploring these solutions and seeking guidance from the Transformers community, you can work towards resolving the import error caused by the CUDA availability check when running the provided code snippet in systems without NVIDIA GPUs or CUDA support.

SunMarc · 2024-05-13T14:39:22Z

Hi @jkterry1, thanks for this detailed report ! For 3. and 4. , let me know if you want to submit a PR to fix the logger message and split into two checks ! Otherwise, I can do it ! For 1. and 5., this is indeed strange that the first cuda check passes but failed in the second check in bitsandbytes. We can pontentially remove the cuda import check in is_bitsandbytes_available but it would be better for the first check to work correctly.

jkterry1 · 2024-05-13T14:51:24Z

Thank you so much!

If you'd be willing to do PRs yourself for 3 and 4, I'd be extraordinarily grateful (also note that it's likely that you'll need to make the fix in 3 to the 8 bit version of this file).

Regarding options 1 and 4, I personally believe that you should remove the cuda check from is_bitsandbytes_available so that when the function returns false it's only for the expected reason, and perform all cuda availability checks outside of this to prevent future unexpected errors.

Additionally, I think it would likely be prudent to verify that the code snippet to check for cuda described in 4 that threw a false positive for me and started me down this journey correctly throws a negative in test environments without GPUs, because it threw a false positive in the docker image I described as well as a random Arch linux laptop, suggesting something very unlikely happened to me or there's something wrong in the logic.

amyeroberts added the Quantization label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BitsNBytes 4 bit quantization error message typo and logical errors in error message handling #30751

BitsNBytes 4 bit quantization error message typo and logical errors in error message handling #30751

jkterry1 commented May 10, 2024 •

edited

cw235 commented May 10, 2024

SunMarc commented May 13, 2024

jkterry1 commented May 13, 2024

BitsNBytes 4 bit quantization error message typo and logical errors in error message handling #30751

BitsNBytes 4 bit quantization error message typo and logical errors in error message handling #30751

Comments

jkterry1 commented May 10, 2024 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

cw235 commented May 10, 2024

SunMarc commented May 13, 2024

jkterry1 commented May 13, 2024

jkterry1 commented May 10, 2024 •

edited