CUDA out of memory #2807

zkdfbb · 2024-03-05T03:27:36Z

Hello, I want to evaluate my model after QuantizationSimModel, but encountered a CUDA out of memory error. Normally, evaluating the model requires 7GB of VRAM, but after quantization, even 80GB is not enough. How can I solve this problem?

My environment as follows:
python: 3.8.10
pytorch: 2.2.0
aimet: 1.30.0

Another problem:
where I export the quantized model to onnx, There is a warning message：
[W shape_type_inference.cpp:1973] Warning: The shape inference of aimet_torch::CustomMarker type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)

It seems to have no impact on the exported ONNX model, but I still want to ask if this message can be eliminated

The text was updated successfully, but these errors were encountered:

quic-klhsieh · 2024-03-07T19:05:46Z

@zkdfbb , can you share some details about your workflow/pipeline? Are you simply instantiating your model, passing it into QuantizationSimModel, computing encodings, and then using qsim.model(...) to run evaluation?

If you could also provide some other metrics:

size of the original model
size of the quantized model
are you using any range learning quantization scheme in quantsim init?
how much memory the original model and quantized model take for forward pass if evaluation is done in a with torch.no_grad() scope

Regarding the onnx warning message, currently we have not really looked into how to silence these warnings. We can mark it as a to do item for better user experience.

zkdfbb · 2024-03-08T10:53:15Z

@quic-klhsieh , You can try to reproduce with the code below.

I have solved the problem, I use @torch.no_grad to decorate a function which include model forward and post processing.
Normally it works fine but failed with aimet. After I use with torch.no_grad() only with the model forward, problem solved.

But I think there is still a problem with quantization aware training, the GPU memory is growing so fast that it becomes unusable quickly.

import torch
from tqdm import tqdm
from torchvision.models.resnet import resnet152
from aimet_common.defs import QuantScheme
from aimet_torch.quantsim import QuantizationSimModel


class Dataset(torch.utils.data.Dataset):
    def __len__(self):
        return 160

    def __getitem__(self, idx):
        return torch.randn([3, 224, 224])


def compute_forward(model, dataloader):
    model.eval()
    with torch.no_grad():
        for inputs in tqdm(dataloader):
            model(inputs.cuda())
    model.train()

dataset = Dataset()
dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
model = resnet152().cuda()

dummy_input = torch.randn([1, 3, 224, 224]).cuda()
sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.training_range_learning_with_tf_init,
                           dummy_input=dummy_input,
                           rounding_mode='nearest',
                           default_output_bw=8,
                           default_param_bw=8,
                           in_place=False)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)
sim.compute_encodings(forward_pass_callback=compute_forward, forward_pass_callback_args=dataloader)
# GPU: 1587 MB

model = sim.model
model.eval()
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)
data_iter = iter(dataloader)
inputs = next(data_iter)
model(inputs.cuda())
# GPU: 13337 MB

inputs = next(data_iter)
model(inputs.cuda())
# GPU: 25811 MB

inputs = next(data_iter)
model(inputs.cuda())
# GPU: 38289 MB

quic-klhsieh · 2024-03-15T01:26:05Z

@zkdfbb Glad to hear torch.no_grad() worked for you. Thank you for the code snippet, we can take a look at why the memory continues to increase with each iteration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #2807

CUDA out of memory #2807

zkdfbb commented Mar 5, 2024

quic-klhsieh commented Mar 7, 2024

zkdfbb commented Mar 8, 2024 •

edited

quic-klhsieh commented Mar 15, 2024

CUDA out of memory #2807

CUDA out of memory #2807

Comments

zkdfbb commented Mar 5, 2024

quic-klhsieh commented Mar 7, 2024

zkdfbb commented Mar 8, 2024 • edited

quic-klhsieh commented Mar 15, 2024

zkdfbb commented Mar 8, 2024 •

edited