"RuntimeError: CUDA error: an illegal memory access was encountered" when trying to run the plain demo. #3

mattzh72 · 2024-02-26T02:17:02Z

Following the guide to run the demo, I received this error:

Command:

CUDA_VISIBLE_DEVICES=0 python3 SLD_demo.py --json-file demo/self_correction/data.json --input-dir demo/self_correction/src_image --output-dir demo/self_correction/results --mode self_correction --config demo_config.ini

Error:

----- Image Manipulation -----
Traceback (most recent call last):
  File "/viscam/projects/concepts/mattzh1314/t2i-eval/model-runners/tsunghan-wu-sld/SLD_demo.py", line 340, in <module>
    deletion_region = get_remove_region(
  File "/viscam/projects/concepts/mattzh1314/t2i-eval/model-runners/tsunghan-wu-sld/SLD_demo.py", line 49, in get_remove_region
    masks = run_sam(bbox=obj[1], image_source=image_source, models=models)
  File "/viscam/projects/concepts/mattzh1314/t2i-eval/model-runners/tsunghan-wu-sld/sld/utils.py", line 49, in run_sam
    masks, _ = sam.sam(
  File "/viscam/projects/concepts/mattzh1314/t2i-eval/model-runners/tsunghan-wu-sld/models/sam.py", line 59, in sam
    outputs = sam_model(**inputs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/transformers/models/sam/modeling_sam.py", line 1361, in forward
    vision_outputs = self.vision_encoder(
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/transformers/models/sam/modeling_sam.py", line 1033, in forward
    hidden_states = self.patch_embed(pixel_values)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/transformers/models/sam/modeling_sam.py", line 145, in forward
    embeddings = self.projection(pixel_values).permute(0, 2, 3, 1)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/viscam/u/mattzh1314/miniconda3/envs/sld/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

The environment was set up brand new, per instructions from the README. It appears that the SAM is throwing this error.

The text was updated successfully, but these errors were encountered:

mattzh72 · 2024-02-26T02:19:37Z

Running nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A5000    On   | 00000000:81:00.0 Off |                  Off |
| 30%   30C    P8    17W / 230W |      1MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

tsunghan-wu · 2024-02-26T23:49:31Z

Hey, did you run the demo code and ran into issues? Or you're using your own image?
It's weird cuz we just utilized the off-the-shelf SAM module......

mattzh72 · 2024-02-27T00:02:46Z

I'm running the demo code, with 0 modifications. I have a fresh conda environment that was created per the demo instructions.

mattzh72 · 2024-02-27T00:03:13Z

Running on single GPU, included details from the nvidia-smi command above.

tsunghan-wu · 2024-02-27T00:14:01Z

As I haven't encountered the same error using, I was wondering if this error message is probably related to the OOM issue, such as this stackoverflow link and this github issue link. While I can't be certain if this is the exact cause in your case, it might be worth considering. One suggestion could be to switch from SAM-vit-huge to SAM-vit-base here and re-run again...

Let me know if you fix the issue with the new setting. You can refer to #2 as it seems that people also faced the same issues under low-memory GPU.

ChiehYunChen · 2024-03-04T10:32:16Z

Hey @mattzh72,
Have you solved the issue?

I think it might not be the OOM error since I use one 3090 GPU with 24GB RAM (which is similar to yours) and successfully inference the demo. I haven't encountered the issue you reported but I encountered OOM when loading SDXL, which later was solved with half precision.

Hope my experience helps! Following is my environment information:

OS: Ubuntu 20.04 LTS
GPU: 3090
Driver Version: 525.147.05
print(torch.version): 2.0.1+cu117
print(torch.version.cuda): 11.7
print(torch.backends.cudnn.version()): 8500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: CUDA error: an illegal memory access was encountered" when trying to run the plain demo. #3

"RuntimeError: CUDA error: an illegal memory access was encountered" when trying to run the plain demo. #3

mattzh72 commented Feb 26, 2024 •

edited

mattzh72 commented Feb 26, 2024

tsunghan-wu commented Feb 26, 2024 •

edited

mattzh72 commented Feb 27, 2024

mattzh72 commented Feb 27, 2024

tsunghan-wu commented Feb 27, 2024

ChiehYunChen commented Mar 4, 2024

"RuntimeError: CUDA error: an illegal memory access was encountered" when trying to run the plain demo. #3

"RuntimeError: CUDA error: an illegal memory access was encountered" when trying to run the plain demo. #3

Comments

mattzh72 commented Feb 26, 2024 • edited

mattzh72 commented Feb 26, 2024

tsunghan-wu commented Feb 26, 2024 • edited

mattzh72 commented Feb 27, 2024

mattzh72 commented Feb 27, 2024

tsunghan-wu commented Feb 27, 2024

ChiehYunChen commented Mar 4, 2024

mattzh72 commented Feb 26, 2024 •

edited

tsunghan-wu commented Feb 26, 2024 •

edited