Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BitsNBytes 4 bit quantization error message typo and logical errors in error message handling #30751

Open
4 tasks
jkterry1 opened this issue May 10, 2024 · 3 comments

Comments

@jkterry1
Copy link

jkterry1 commented May 10, 2024

System Info

Newest version of transformers, accelerate, bitsandbytes in a docker container (nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04), Ubuntu 20.04 system, Arch Linux laptop

Who can help?

@SunMarc and @younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import RobertaForSequenceClassification, RobertaTokenizer
eval_model_path = "hubert233/GPTFuzz"
tokenizer = RobertaTokenizer.from_pretrained(eval_model_path)
eval_model = RobertaForSequenceClassification.from_pretrained(
    eval_model_path,
    low_cpu_mem_usage=True,
    device_map="auto",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16
    ),
)
eval_model.eval()

Impacts other models when using BNB 4 bit, e.g. meta-llama/Llama-2-7b-chat-hf

Expected behavior

On a system where cuda isn't working (e.g. a laptop with no Nvidia GPU or a container that wasn't properly connected to the host OS driver), running that code snippet gives an import error:

image

The logic at issue starts here:

def validate_environment(self, *args, **kwargs):

  1. The if not torch.cuda.is_available(): error message seemingly is frequently bypassed on multiple systems with no torch cuda functionality, passing the system state onto if not (is_accelerate_available() and is_bitsandbytes_available()):
  2. is_bitsandbytes_available()) calls out to
    def is_bitsandbytes_available():
    and is_accelerate_available() calls out to
    def is_accelerate_available(min_version: str = ACCELERATE_MIN_VERSION):
    . In the instance where the first is check is bypassed, the if not torch.cuda.is_available(): function that's built into the is_bitsandbytes_available()) fails, resulting in getting a wildly confusing error message of "Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes" when accelerate and bitsandbytes are both installed.
  3. Ignoring the issue in the torch call that resulted in my getting into this in the first place, I believe that and is tthe incorrect logical operator to use in
    if not (is_accelerate_available() and is_bitsandbytes_available()):
    , because the error message says the problem is in accelerate not being present while bitsandbytes is
  4. The error message says "Using bitsandbytes 8-bit quantization" in the 4-bit version of the file
  5. I suspect that the original issue with the torch cuda availability check being erroneously passed is caused by the logic in this line:

Additionally, having is_bitsandbytes_available()) call torch.cuda.is_available() seems like a non obvious and non modular design choice that's likely to result in similar very misleading and hard to debug error messages in the future.

I think that the underlying logical issues here also likely resulted in this other github sub-issue: #29177 (comment)

@jkterry1 jkterry1 changed the title BNB 4 bit error message typo; torch.cuda.is_available() error in newest version of torch inconsistently results in misleading error message when using bits and bytes quantization BNB 4 bit error message typo; torch.cuda.is_available() call error results in highly misleading error message when using bits and bytes quantization on a system without cuda May 10, 2024
@cw235
Copy link

cw235 commented May 10, 2024

It seems like you are facing an issue with running the code snippet provided in a system where CUDA is not available, resulting in an import error related to torch.cuda. The issue stems from the logic within the Transformers library that checks for CUDA availability before enabling BitsAndBytes functionality.

Here are some steps and potential solutions to address this problem:

  1. Identifying the Issue:

    • The code snippet includes logic that checks for the availability of CUDA before enabling BitsAndBytes functionality. This check is necessary to ensure that CUDA operations can be used efficiently. However, in systems without NVIDIA GPUs or proper CUDA configurations, this check can cause import errors.
  2. Solutions:

    • Option 1: Error Handling:

      • Modify the code to handle the case where CUDA is not available gracefully. You can add conditional logic to bypass the CUDA check if the system does not support it.
    • Option 2: Environment Configuration:

      • Ensure that your Docker container is correctly configured to access the host system's CUDA driver. This configuration is crucial for running CUDA-dependent operations within the container.
    • Option 3: Alternative Device Configuration:

      • Consider specifying a different device for model computation, such as CPU, when CUDA is not available. This approach can prevent errors related to CUDA dependencies.
  3. Further Investigation:

    • Review the logic within the Quantizer module of the Transformers library that handles CUDA availability checks. Understanding this logic may provide insights into why the conditional checks are not working as expected in your system configuration.
  4. Community Support:

    • Reach out to the Transformers library maintainers, such as @SunMarc and @younesbelkada, for assistance with debugging the specific issue related to CUDA availability checks and BitsAndBytes functionality.

By exploring these solutions and seeking guidance from the Transformers community, you can work towards resolving the import error caused by the CUDA availability check when running the provided code snippet in systems without NVIDIA GPUs or CUDA support.

@jkterry1 jkterry1 changed the title BNB 4 bit error message typo; torch.cuda.is_available() call error results in highly misleading error message when using bits and bytes quantization on a system without cuda BitsNBytes 4 bit quantization error message typo and logical errors in error message handling May 10, 2024
@SunMarc
Copy link
Member

SunMarc commented May 13, 2024

Hi @jkterry1, thanks for this detailed report ! For 3. and 4. , let me know if you want to submit a PR to fix the logger message and split into two checks ! Otherwise, I can do it ! For 1. and 5., this is indeed strange that the first cuda check passes but failed in the second check in bitsandbytes. We can pontentially remove the cuda import check in is_bitsandbytes_available but it would be better for the first check to work correctly.

@jkterry1
Copy link
Author

Thank you so much!

If you'd be willing to do PRs yourself for 3 and 4, I'd be extraordinarily grateful (also note that it's likely that you'll need to make the fix in 3 to the 8 bit version of this file).

Regarding options 1 and 4, I personally believe that you should remove the cuda check from is_bitsandbytes_available so that when the function returns false it's only for the expected reason, and perform all cuda availability checks outside of this to prevent future unexpected errors.

Additionally, I think it would likely be prudent to verify that the code snippet to check for cuda described in 4 that threw a false positive for me and started me down this journey correctly throws a negative in test environments without GPUs, because it threw a false positive in the docker image I described as well as a random Arch linux laptop, suggesting something very unlikely happened to me or there's something wrong in the logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants