Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[How to release cpu memory after session Run] #20640

Open
wangzhenlin123 opened this issue May 10, 2024 · 1 comment
Open

[How to release cpu memory after session Run] #20640

wangzhenlin123 opened this issue May 10, 2024 · 1 comment
Labels
platform:windows issues related to the Windows platform

Comments

@wangzhenlin123
Copy link

wangzhenlin123 commented May 10, 2024

Describe the issue

hi,Here is a very common situation: after using ONNXruntime for inference, the system has nearly 2GB of memory(not gpu memory) that cannot be released. I have tried many ways to release it, but none have solved the problem....Does ONNXruntime not provide a mechanism to release CPU memory after inference?

 ` Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_ERROR, "yolov8");
Ort::SessionOptions sessionOptions = SessionOptions();

OrtStatus* status = OrtSessionOptionsAppendExecutionProvider_CUDA(sessionOptions, 0);  
sessionOptions.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);

Session* session = new Session(env,wstring(mpath.begin(), mpath.end()).c_str(), sessionOptions);
vector<const char*> input_names = { "images" };
vector<const char*> output_names = { "output0","output1" };
vector<int64_t> input_shape = { 1, 3, 640, 640 };
Mat blob = blobFromImage(image, 1 / 255.0, Size(640, 640), Scalar(0, 0, 0), true, false);
Value input_tensor = Value::CreateTensor<float>(MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault),
	(float*)blob.data, 3 * 640 * 640, input_shape.data(), input_shape.size());
for (int i = 0; i < 100; i++)
{
	auto start = chrono::high_resolution_clock::now();
	auto outputs = session->Run(RunOptions{ nullptr },input_names.data(), &input_tensor, 1, output_names.data(), output_names.size());
	auto end = chrono::high_resolution_clock::now();
	auto duration = chrono::duration_cast<chrono::milliseconds>(end - start).count();
	cout << "ort time: " << duration << " millis.";
}


input_tensor.release();
sessionOptions.release();

session->release();
delete session;
session = nullptr;

env.release();
env = nullptr;

sessionOptions.release();``

To reproduce

This is a common and recurring issue in many version.

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform labels May 10, 2024
@sophies927 sophies927 removed the ep:CUDA issues related to the CUDA execution provider label May 16, 2024
@edgchen1
Copy link
Contributor

When using the C++ API, you probably do not want to call release() and not do anything with the returned value. This will leak resources.

/// \brief Relinquishes ownership of the contained C object pointer
/// The underlying object is not destroyed
contained_type* release() {
T* p = p_;
p_ = nullptr;
return p;
}

The underlying C API release function should get called automatically when the C++ API object goes out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

3 participants