Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RuntimeError: CUDA error: an illegal memory access was encountered" when trying to run the plain demo. #3

Open
mattzh72 opened this issue Feb 26, 2024 · 6 comments

Comments

@mattzh72
Copy link

mattzh72 commented Feb 26, 2024

Following the guide to run the demo, I received this error:

Command:

CUDA_VISIBLE_DEVICES=0 python3 SLD_demo.py --json-file demo/self_correction/data.json --input-dir demo/self_correction/src_image --output-dir demo/self_correction/results --mode self_correction --config demo_config.ini

Error:

----- Image Manipulation -----
Traceback (most recent call last):
  File "/viscam/projects/concepts/mattzh1314/t2i-eval/model-runners/tsunghan-wu-sld/SLD_demo.py", line 340, in <module>
    deletion_region = get_remove_region(
  File "/viscam/projects/concepts/mattzh1314/t2i-eval/model-runners/tsunghan-wu-sld/SLD_demo.py", line 49, in get_remove_region
    masks = run_sam(bbox=obj[1], image_source=image_source, models=models)
  File "/viscam/projects/concepts/mattzh1314/t2i-eval/model-runners/tsunghan-wu-sld/sld/utils.py", line 49, in run_sam
    masks, _ = sam.sam(
  File "/viscam/projects/concepts/mattzh1314/t2i-eval/model-runners/tsunghan-wu-sld/models/sam.py", line 59, in sam
    outputs = sam_model(**inputs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/transformers/models/sam/modeling_sam.py", line 1361, in forward
    vision_outputs = self.vision_encoder(
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/transformers/models/sam/modeling_sam.py", line 1033, in forward
    hidden_states = self.patch_embed(pixel_values)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/transformers/models/sam/modeling_sam.py", line 145, in forward
    embeddings = self.projection(pixel_values).permute(0, 2, 3, 1)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

The environment was set up brand new, per instructions from the README. It appears that the SAM is throwing this error.

@mattzh72
Copy link
Author

Running nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A5000    On   | 00000000:81:00.0 Off |                  Off |
| 30%   30C    P8    17W / 230W |      1MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

@tsunghan-wu
Copy link
Owner

tsunghan-wu commented Feb 26, 2024

Hey, did you run the demo code and ran into issues? Or you're using your own image?
It's weird cuz we just utilized the off-the-shelf SAM module......

@mattzh72
Copy link
Author

I'm running the demo code, with 0 modifications. I have a fresh conda environment that was created per the demo instructions.

@mattzh72
Copy link
Author

Running on single GPU, included details from the nvidia-smi command above.

@tsunghan-wu
Copy link
Owner

As I haven't encountered the same error using, I was wondering if this error message is probably related to the OOM issue, such as this stackoverflow link and this github issue link. While I can't be certain if this is the exact cause in your case, it might be worth considering. One suggestion could be to switch from SAM-vit-huge to SAM-vit-base here and re-run again...

Let me know if you fix the issue with the new setting. You can refer to #2 as it seems that people also faced the same issues under low-memory GPU.

@ChiehYunChen
Copy link

Hey @mattzh72,
Have you solved the issue?

I think it might not be the OOM error since I use one 3090 GPU with 24GB RAM (which is similar to yours) and successfully inference the demo. I haven't encountered the issue you reported but I encountered OOM when loading SDXL, which later was solved with half precision.

Hope my experience helps! Following is my environment information:

OS: Ubuntu 20.04 LTS
GPU: 3090
Driver Version: 525.147.05
print(torch.version): 2.0.1+cu117
print(torch.version.cuda): 11.7
print(torch.backends.cudnn.version()): 8500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants