[Question] Is `IOutputAllocator::reallocateOutput` guaranteed to be called before `context->enqueueV3` returns? #3875

mirzadeh · 2024-05-16T19:19:20Z

Description

I cannot find any information regarding when IOutputAllocator::reallocateOutput is called with respect to context->enqueueV3. Is there any guarantee this function is called before enqueueV3 returns or should I explicitly synchronize stream?

In other words, in the following pseudo-code:

context_->setOutputAllocator(name, allocator);
// ...
context->enqueueV3(stream);
cudaStreamSynchronize(stream); // <-- Is this necessary?

// Memcpy devie -> host; Is it valid to ask allocator for device buffer without stream synchronization?
cudaMemcpyAsync(hostBuffer, allocator->getDeviceBuffer(), ...);

Should I explicitly synchronize the stream after enqueueV3 for device allocator->getDeviceBuffer() to be valid? Or is allocator->reallocateOutput guaranteed to be called before enqueueV3 returns, in which case stream synchronization is unnecessary?

Environment

TensorRT Version:

NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

zerollzeng · 2024-05-26T11:00:02Z

Please refer to our api doc: https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_execution_context.html#aa174ba57c44df821625ce4d3317dd7aa

should I explicitly synchronize stream?

yes

Should I explicitly synchronize the stream after enqueueV3 for device allocator->getDeviceBuffer() to be valid?

the ptr is always valid until you free the memory, but the correct output is ready only after synchronization is done.

mirzadeh · 2024-05-28T15:07:39Z

I think my question was more about the calling order of reallocateOutput and enqueueV3. Since enqueueV3 is async, is it possible that by the time cudaMemcpy is called, reallocateOutput is still not called by TensorRT and therefore the device pointer is invalid (b/c reallocate might return a different pointer)?

If there is guarantee that reallocateOutput is always called by the time enqueueV3 returns, there is no need for an explicit synchronization before memcpy.

mirzadeh changed the title ~~[Question] Is IOutputAllocator::reallocateOutput guaranteed to be called before context->enqueueV3 returns?~~ [Question] Is IOutputAllocator::reallocateOutput guaranteed to be called before context->enqueueV3 returns? May 16, 2024

zerollzeng self-assigned this May 26, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Is `IOutputAllocator::reallocateOutput` guaranteed to be called before `context->enqueueV3` returns? #3875

[Question] Is `IOutputAllocator::reallocateOutput` guaranteed to be called before `context->enqueueV3` returns? #3875

mirzadeh commented May 16, 2024

zerollzeng commented May 26, 2024

mirzadeh commented May 28, 2024

[Question] Is IOutputAllocator::reallocateOutput guaranteed to be called before context->enqueueV3 returns? #3875

[Question] Is IOutputAllocator::reallocateOutput guaranteed to be called before context->enqueueV3 returns? #3875

Comments

mirzadeh commented May 16, 2024

Description

Environment

Relevant Files

Steps To Reproduce

zerollzeng commented May 26, 2024

mirzadeh commented May 28, 2024

[Question] Is `IOutputAllocator::reallocateOutput` guaranteed to be called before `context->enqueueV3` returns? #3875

[Question] Is `IOutputAllocator::reallocateOutput` guaranteed to be called before `context->enqueueV3` returns? #3875