[How to release cpu memory after session Run] #20640

wangzhenlin123 · 2024-05-10T10:50:42Z

Describe the issue

hi，Here is a very common situation: after using ONNXruntime for inference, the system has nearly 2GB of memory（not gpu memory） that cannot be released. I have tried many ways to release it, but none have solved the problem....Does ONNXruntime not provide a mechanism to release CPU memory after inference?

 ` Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_ERROR, "yolov8");
Ort::SessionOptions sessionOptions = SessionOptions();

OrtStatus* status = OrtSessionOptionsAppendExecutionProvider_CUDA(sessionOptions, 0);  
sessionOptions.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);

Session* session = new Session(env,wstring(mpath.begin(), mpath.end()).c_str(), sessionOptions);
vector<const char*> input_names = { "images" };
vector<const char*> output_names = { "output0","output1" };
vector<int64_t> input_shape = { 1, 3, 640, 640 };
Mat blob = blobFromImage(image, 1 / 255.0, Size(640, 640), Scalar(0, 0, 0), true, false);
Value input_tensor = Value::CreateTensor<float>(MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault),
	(float*)blob.data, 3 * 640 * 640, input_shape.data(), input_shape.size());
for (int i = 0; i < 100; i++)
{
	auto start = chrono::high_resolution_clock::now();
	auto outputs = session->Run(RunOptions{ nullptr },input_names.data(), &input_tensor, 1, output_names.data(), output_names.size());
	auto end = chrono::high_resolution_clock::now();
	auto duration = chrono::duration_cast<chrono::milliseconds>(end - start).count();
	cout << "ort time: " << duration << " millis.";
}


input_tensor.release();
sessionOptions.release();

session->release();
delete session;
session = nullptr;

env.release();
env = nullptr;

sessionOptions.release();``

To reproduce

This is a common and recurring issue in many version.

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

edgchen1 · 2024-05-16T23:25:04Z

When using the C++ API, you probably do not want to call release() and not do anything with the returned value. This will leak resources.

onnxruntime/include/onnxruntime/core/session/onnxruntime_cxx_api.h

Lines 575 to 581 in e81c867

    
           /// \brief Relinquishes ownership of the contained C object pointer 
        
           /// The underlying object is not destroyed 
        
           contained_type* release() { 
        
             T* p = p_; 
        
             p_ = nullptr; 
        
             return p; 
        
           }

The underlying C API release function should get called automatically when the C++ API object goes out of scope.

github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform labels May 10, 2024

sophies927 removed the ep:CUDA issues related to the CUDA execution provider label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[How to release cpu memory after session Run] #20640

[How to release cpu memory after session Run] #20640

wangzhenlin123 commented May 10, 2024 •

edited

edgchen1 commented May 16, 2024

[How to release cpu memory after session Run] #20640

[How to release cpu memory after session Run] #20640

Comments

wangzhenlin123 commented May 10, 2024 • edited

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

edgchen1 commented May 16, 2024

wangzhenlin123 commented May 10, 2024 •

edited