torch.cuda.OutOfMemoryError: CUDA out of memory.real_basicvsr/basicvsr++运行报错 #2117

txy00001 · 2024-01-25T09:52:33Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (main) or latest version (0.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmagic

Environment

cuda 11.7 cudnn8.9 gpu：4090

Reproduces the problem - code sample

import os
import time
from mmagic.apis import MMagicInferencer
from mmengine import mkdir_or_exist

# Create a MMagicInferencer instance and infer

video = '/home/txy/code/blur/video/6.mp4'
result_out_dir = '/home/txy/code/blur/output/6.mp4'
mkdir_or_exist(os.path.dirname(result_out_dir))
beg=time.time()
editor = MMagicInferencer('real_basicvsr', device='cuda:1')
results = editor.infer(video=video, result_out_dir=result_out_dir)
time.time-beg
print(time.time-beg)

Reproduces the problem - command or script

import os
import time
from mmagic.apis import MMagicInferencer
from mmengine import mkdir_or_exist

# Create a MMagicInferencer instance and infer

video = '/home/txy/code/blur/video/6.mp4'
result_out_dir = '/home/txy/code/blur/output/6.mp4'
mkdir_or_exist(os.path.dirname(result_out_dir))
beg=time.time()
editor = MMagicInferencer('real_basicvsr', device='cuda:1')
results = editor.infer(video=video, result_out_dir=result_out_dir)
time.time-beg
print(time.time-beg)

Reproduces the problem - error message

/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1. You can also use weights=VGG19_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loads checkpoint by http backend from path: https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: step_counter

01/25 17:15:56 - mmengine - WARNING - Failed to search registry with scope "mmagic" in the "function" registry tree. As a workaround, the current "function" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmagic" is a correct scope, or whether the registry is initialized.
/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the save_dir argument.
warnings.warn(f'Failed to add {vis_backend.class}, '
Traceback (most recent call last):
File "/home/txy/code/blur/demo/bas_real/real_infer_video.py", line 12, in
results = editor.infer(video=video, result_out_dir=result_out_dir)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/mmagic_inferencer.py", line 231, in infer
return self.inferencer(
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/init.py", line 110, in call
return self.inferencer(**kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py", line 139, in call
results = self.base_call(**kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py", line 165, in base_call
preds = self.forward(data, **forward_kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/video_restoration_inferencer.py", line 127, in forward
result = self.model(
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/base_models/base_edit_model.py", line 109, in forward
return self.forward_tensor(inputs, data_samples, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/real_esrgan/real_esrgan.py", line 112, in forward_tensor
feats = self.generator_ema(inputs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/real_basicvsr/real_basicvsr_net.py", line 88, in forward
residues = self.image_cleaning(lqs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/basicvsr/basicvsr_net.py", line 214, in forward
return self.main(feat)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 59.33 GiB (GPU 1; 23.65 GiB total capacity; 2.92 GiB already allocated; 19.91 GiB free; 2.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additional information

我想对视频进行超分，视频是1s，和5s的，分辨率是1280×720，3160×2160，尝试real_basicvsr和basicvsr++都是

这个错误，换了服务器和视频依旧有问题，我该如何解决？

The text was updated successfully, but these errors were encountered:

Feynman1999 · 2024-01-26T04:02:00Z

路过回复下（非官方），我猜测是传入模型的帧数太多了，比如你说的1s视频，1280x720,如果是30帧，需要的显存也不少了，应该在几十G的量级。
所以你需要确定：

editor.infer传入video, 是解码所有帧然后将所有帧调用模型；还是使用max_seq_len参数来窗口调用；具体逻辑见代码：https://github.com/open-mmlab/mmagic/blob/main/mmagic/apis/inferencers/video_restoration_inferencer.py#L126
如果1.是将所有帧传入，那么就是因为帧数过多导致oom；如果1是已经只传部分帧，那么在你这个分辨率下，需要将部分帧减少（max_seq_len参数），来避免oom
至于如何修改max_seq_len参数，你需要找下怎么对inferencer传参，整体的模型参数逻辑在
https://github.com/open-mmlab/mmagic/blob/main/mmagic/apis/mmagic_inferencer.py#L150，看起来是会读取一个默认的配置文件

txy00001 · 2024-01-26T07:52:26Z

路过回复下（非官方），我猜测是传入模型的帧数太多了，比如你说的1s视频，1280x720,如果是30帧，需要的显存也不少了，应该在几十G的量级。所以你需要确定：

editor.infer传入video，是解码所有帧然后将所有帧调用模型;还是使用max_seq_len参数来窗口调用;具体逻辑见代码：https://github.com/open-mmlab/mmagic/blob/main/mmagic/apis/inferencers/video_restoration_inferencer.py#L126

如果1.是将所有帧传入，那么就是因为帧数过多导致oom;如果1是已经只传部分帧，那么在你这个分辨率下，需要将部分帧减少（max_seq_len参数），来避免oom 至于如何修改max_seq_len参数，你需要找下怎么对inferencer传参，整体的模型参数逻辑在 https://github.com/open-mmlab/mmagic/blob/main/mmagic/apis/mmagic_inferencer.py#L150，看起来是会读取一个默认的配置文件

感谢，我试一下

txy00001 added the kind/bug something isn't working label Jan 25, 2024

mm-assistant bot assigned LeoXing1996 Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.cuda.OutOfMemoryError: CUDA out of memory.real_basicvsr/basicvsr++运行报错 #2117

torch.cuda.OutOfMemoryError: CUDA out of memory.real_basicvsr/basicvsr++运行报错 #2117

txy00001 commented Jan 25, 2024

Feynman1999 commented Jan 26, 2024 •

edited

txy00001 commented Jan 26, 2024

torch.cuda.OutOfMemoryError: CUDA out of memory.real_basicvsr/basicvsr++运行报错 #2117

torch.cuda.OutOfMemoryError: CUDA out of memory.real_basicvsr/basicvsr++运行报错 #2117

Comments

txy00001 commented Jan 25, 2024

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

# Create a MMagicInferencer instance and infer

Reproduces the problem - command or script

# Create a MMagicInferencer instance and infer

Reproduces the problem - error message

Additional information

Feynman1999 commented Jan 26, 2024 • edited

txy00001 commented Jan 26, 2024

Feynman1999 commented Jan 26, 2024 •

edited