Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Tests for compression fail on GPU servers with bitsandbytes installed #507

Open
mryab opened this issue Sep 10, 2022 · 0 comments
Open
Assignees
Labels
bug Something isn't working ci Continuous Integration, tests, or deployment

Comments

@mryab
Copy link
Member

mryab commented Sep 10, 2022

Describe the bug
While working on #490, I found that if I have bitsandbytes installed in a GPU-enabled environment, I get an error when running test_adaptive_compression, which happens to be the only test that uses TrainingAverager under the hood.

I dug into it a bit, and the failure seems to be caused by CUDA error: initialization error from PyTorch, which AFAIK emerges when we're trying to initialize the CUDA context twice. More specifically, it appears when we are trying to initialize the optimizer states in TrainingAverager. My guess is that the context is created when importing bitsandbytes first and then when using something (anything?) from GPU-enabled PyTorch later. We are sunsetting the support for TrainingAverager anyway, but to me it's not obvious how to correctly migrate from this class in a given test.

To Reproduce
Install the environment in a GPU-enabled system, try running CUDA_LAUNCH_BLOCKING=1 pytest -s --full-trace tests/test_compression.py. Then uninstall bitsandbytes, comment out the parts in test_compression that rely on it (mostly test_tensor_compression), run the same command.

Environment

  • Python 3.8.8
  • Commit 131f82c
  • PyTorch 1.12.1, bitsandbytes 0.32.3
  • NVIDIA RTX 2080 Ti GPU
@mryab mryab added bug Something isn't working ci Continuous Integration, tests, or deployment labels Sep 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci Continuous Integration, tests, or deployment
Projects
None yet
Development

No branches or pull requests

2 participants