Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #71

Open
ZuoJiaxing opened this issue May 13, 2020 · 2 comments
Open

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #71

ZuoJiaxing opened this issue May 13, 2020 · 2 comments

Comments

@ZuoJiaxing
Copy link

I followed the instruction you post to install the docker on ubuntu 18.04 with RTX 2020Ti. However, when I run the "python3 tasks/R2R/train.py", I got the following error. Actually, I have tried lots of docker images that meet your requirement cuda9.2+cudnn7+pytorch1.1, I can not go through this error! Please help me with this! Thanks!

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py:54: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
File "tasks/R2R/train.py", line 163, in
train_val()
File "tasks/R2R/train.py", line 156, in train_val
dropout_ratio, bidirectional=bidirectional).cuda()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 265, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 127, in _apply
self.flatten_parameters()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

@ManaalAhi
Copy link

ManaalAhi commented Dec 21, 2020

Hello,

There is a conflict between the CUDNN and Pytorch Version. To fix this try changing the PyTorch version. You can find the correct one which is compatible with your CUDA and CUDNN version at the following link: https://pytorch.org/get-started/previous-versions/

@staale92
Copy link

staale92 commented Mar 6, 2023

Hi,
notice that the error might also be caused by the fact that your GPU is incompatible with CUDA 9.2. I had the same error, and in my case (RTX A6000 GPU) changing the following lines in the Dockerfile:

FROM nvidia/cudagl:9.2-devel-ubuntu18.04

# Install cudnn
ENV CUDNN_VERSION 7.6.4.38
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"

RUN apt-get update && apt-get install -y --no-install-recommends \
    libcudnn7=$CUDNN_VERSION-1+cuda9.2 \
libcudnn7-dev=$CUDNN_VERSION-1+cuda9.2 \
&& \
    apt-mark hold libcudnn7 && \
    rm -rf /var/lib/apt/lists/*

with:

FROM nvidia/cudagl:11.1-devel-ubuntu18.04

# Install cudnn
ENV CUDNN_VERSION 7.6.4.38
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"

RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list

did the trick. Your GPU seems to require CUDA > 10.0 (you can check the CUDA compatibility here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants