Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train error #251

Open
kkkkkk123-ops opened this issue Mar 20, 2024 · 4 comments
Open

train error #251

kkkkkk123-ops opened this issue Mar 20, 2024 · 4 comments

Comments

@kkkkkk123-ops
Copy link

when i use bash scripts/DINO_train.sh /path/to/your/COCODIR to train the model, there's the following error.
Traceback (most recent call last):
File "main.py", line 395, in
main(args)
File "main.py", line 280, in main
train_stats = train_one_epoch(
File "/root/onethingai-tmp/plaque_detection/DINO-main/engine.py", line 52, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/onethingai-tmp/plaque_detection/DINO-main/models/dino/dino.py", line 569, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/onethingai-tmp/plaque_detection/DINO-main/models/dino/matcher.py", line 84, in forward
cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1)
File "/root/miniconda3/lib/python3.8/site-packages/torch/functional.py", line 1222, in cdist
return _VF.cdist(x1, x2, p, None) # type: ignore[attr-defined]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@PedroCHS
Copy link

Hi, we are facing the same problem. Have you managed to solve it?

@kkkkkk123-ops
Copy link
Author

I

Hi, we are facing the same problem. Have you managed to solve it?

I have solved it but i forgot exactly. In my memory you could try to download the right version of mmdet=2.25.3 and mmcv=1.5.0, the version of these 2 package should be strictly correct otherwise there are some strange mistakes . Then you could make sure that your dataset is coco format and num_class in your config file fits your dataset which should be n+1.

@PedroCHS
Copy link

Thanks a lot, it works!

@adalinadalin
Copy link

@PedroCHS Hey, I encountered the same error message. If my dataset in coco file has seven categories, how should I set the config? num_class = 8 and dn_labelbook_size = 8? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants