Hello, I'm having some problems. RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #17

zhangkuncsdn · 2021-02-22T09:26:50Z

Training model (TrainedModels/pantheon/2021_02_22_15_28_21_generation_train_depth_3_lr_scale_0.1_act_lrelu_0.05)
Training model with the following parameters:
number of stages: 6
number of concurrently trained stages: 3
learning rate scaling: 0.1
non-linearity: lrelu
Training on image pyramid: [torch.Size([1, 3, 26, 42]), torch.Size([1, 3, 31, 51]), torch.Size([1, 3, 40, 66]), torch.Size([1, 3, 57, 94]), torch.Size([1, 3, 106, 175]), torch.Size([1, 3, 152, 250])]

stage [0/5]:: 0%| | 0/1000 [00:00<?, ?it/s]T
raceback (most recent call last):
File "main_train.py", line 118, in
train(opt)
File "G:\ConSinGAN\ConSinGAN\training_generation.py", line 48, in train
fixed_noise, noise_amp, generator, d_curr = train_single_scale(d_curr, generator, reals, fixed_noise, noise_amp, opt, scale_num, writer)
File "G:\ConSinGAN\ConSinGAN\training_generation.py", line 156, in train_single_scale
gradient_penalty = functions.calc_gradient_penalty(netD, real, fake, opt.lambda_grad, opt.device)
File "G:\ConSinGAN\ConSinGAN\functions.py", line 122, in calc_gradient_penalty
create_graph=True, retain_graph=True, only_inputs=True)[0]
File "D:\Anaconda3\envs\ConSinGAN\lib\site-packages\torch\autograd_init_.py", line 149, in grad
inputs, allow_unused)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

tohinz · 2021-02-23T10:21:34Z

Hi, that looks more like a problem with your Pytorch installation. Are you sure you have the correct CUDA and CUDNN version installed for your graphic card and Pytorch version?

zhangkuncsdn · 2021-02-23T11:30:20Z

嗨，这看起来更像是您的Pytorch安装问题。您确定为图形卡和Pytorch版本安装了正确的CUDA和CUDNN版本吗？

Hi, does the current ConsinGAN environment support Pytorch 1.7?

tohinz · 2021-02-24T09:24:29Z

I haven't tested it with Pytorch 1.7 but in general it should work (I assume at least it would give you a different error message from the one above). The error is thrown at the torch.autograd.grad() function which is why I believe it's a problem with your environment and not with the code itself.
I would suggest running the code on CPU (use flag --not_cuda) to see if it works on CPU or if you get a more informative error message. I haven't tested it on CPU myself so you might have to add .to(torch.device('cpu')) at some points if Pytorch raises errors about GPU/CPU mismatch.

zhangkuncsdn · 2021-02-24T11:17:40Z

I haven't tested it with Pytorch 1.7 but in general it should work (I assume at least it would give you a different error message from the one above). The error is thrown at the torch.autograd.grad() function which is why I believe it's a problem with your environment and not with the code itself.
I would suggest running the code on CPU (use flag --not_cuda) to see if it works on CPU or if you get a more informative error message. I haven't tested it on CPU myself so you might have to add .to(torch.device('cpu')) at some points if Pytorch raises errors about GPU/CPU mismatch.

Thank you very much. Use Flag -- Not CUDA can run.There's another question I'd like to ask you.If I want to input a single channel grayscale image for training, how should I modify the network?

tohinz · 2021-02-24T13:45:02Z

Just set --nc_im 1 and represent your image as shape (H x W x 1), i.e. 1 channel instead of 3 for RGB

zhangkuncsdn · 2021-02-24T14:04:55Z

Just set --nc_im 1 and represent your image as shape (H x W x 1), i.e. 1 channel instead of 3 for RG
I've got --nc_im 1, but I'm running into the following problem.
Training model (TrainedModels/07/2021_02_24_22_00_10_generation_train_depth_3_lr_scale_0.1_act_lrelu_0.05)
Training model with the following parameters:
number of stages: 6
number of concurrently trained stages: 3
learning rate scaling: 0.1
non-linearity: lrelu
Traceback (most recent call last):
File "main_train.py", line 118, in
train(opt)
File "G:\ConSinGAN\ConSinGAN\training_generation.py", line 23, in train
real = functions.adjust_scales2image(real, opt)
File "G:\ConSinGAN\ConSinGAN\functions.py", line 185, in adjust_scales2image
real = imresize(real_, opt.scale1, opt)
File "G:\ConSinGAN\ConSinGAN\imresize.py", line 52, in imresize
im = np2torch(im,opt)
File "G:\ConSinGAN\ConSinGAN\imresize.py", line 26, in np2torch
x = color.rgb2gray(x)
File "D:\Anaconda3\envs\ConSinGAN\lib\site-packages\skimage\color\colorconv.py", line 799, in rgb2gray
rgb = _prepare_colorarray(rgb[..., :3])
File "D:\Anaconda3\envs\ConSinGAN\lib\site-packages\skimage\color\colorconv.py", line 152, in _prepare_colorarray
raise ValueError(msg)
ValueError: the input array must be have a shape == (.., ..,[ ..,] 3)), got (164, 250, 1)

tohinz · 2021-02-24T15:59:15Z

You will have to change the code slightly then to adapt to this.
Another easy work-around is to just convert your gray-scale image to a "color image" with 3 channels, e.g. with OpenCV cv2.cvtColor(gray_img, cv.CV_GRAY2RGB)

zhangkuncsdn · 2021-02-25T07:29:41Z

You will have to change the code slightly then to adapt to this.
Another easy work-around is to just convert your gray-scale image to a "color image" with 3 channels, e.g. with OpenCV cv2.cvtColor(gray_img, cv.CV_GRAY2RGB)

There are some problems when I change the code. Can you give me some advice?

tohinz · 2021-03-01T15:45:09Z

What are the problems?

FluppyBird · 2023-01-12T05:17:14Z

I had the same problem 3 days ago, and I used conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch to unexpectedly ran it. This vision of torch is the same as SinGAN, maybe you can try it. : )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hello, I'm having some problems. RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #17

Hello, I'm having some problems. RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #17

zhangkuncsdn commented Feb 22, 2021

tohinz commented Feb 23, 2021

zhangkuncsdn commented Feb 23, 2021

tohinz commented Feb 24, 2021 •

edited

zhangkuncsdn commented Feb 24, 2021

tohinz commented Feb 24, 2021

zhangkuncsdn commented Feb 24, 2021

tohinz commented Feb 24, 2021

zhangkuncsdn commented Feb 25, 2021

tohinz commented Mar 1, 2021

FluppyBird commented Jan 12, 2023

Hello, I'm having some problems. RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #17

Hello, I'm having some problems. RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #17

Comments

zhangkuncsdn commented Feb 22, 2021

tohinz commented Feb 23, 2021

zhangkuncsdn commented Feb 23, 2021

tohinz commented Feb 24, 2021 • edited

zhangkuncsdn commented Feb 24, 2021

tohinz commented Feb 24, 2021

zhangkuncsdn commented Feb 24, 2021

tohinz commented Feb 24, 2021

zhangkuncsdn commented Feb 25, 2021

tohinz commented Mar 1, 2021

FluppyBird commented Jan 12, 2023

tohinz commented Feb 24, 2021 •

edited