Converting engine file from onnx file with ReduceMax failure of TensorRT 8.5.10 when running trtexec on GPU Orin #3866

JYS997760473 · 2024-05-15T05:58:44Z

Description

I tried to generate engine file from onnx file on Orin GPU, but it failed:
[05/15/2024-11:45:16] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
[05/15/2024-11:45:16] [E] Saving engine to file failed.
[05/15/2024-11:45:16] [E] Engine set up failed

Environment

TensorRT Version:

NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-05-16T02:14:29Z

Please add --verbose to get more detailed log.

JYS997760473 · 2024-05-16T02:50:47Z

Please add --verbose to get more detailed log.

Hi, I replaced the original nn.Layernorm block by nn.BatchNormalization block. Now my new network onnx file is :

According to the docucment: "https://github.com/NVIDIA/Deep-Learning-Accelerator-SW/tree/main/operators", BatchNormalizaion operator is supported native by Nvidia DLA, but when I try to generate engine file from the onnx file, I still failed. The end part of log is here:

[05/15/2024-20:44:50] [V] [TRT] Layer: MaxPool_5 Host Persistent: 1408 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_12 Host Persistent: 6752 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_13 || Gemm_14 Host Persistent: 5664 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_15 Host Persistent: 6752 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: PWN(onnx::Div_41 + (Unnamed Layer* 33) [Shuffle], Div_17) Host Persistent: 244 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_19 Host Persistent: 6048 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_20 Host Persistent: 6048 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_21 Host Persistent: 6048 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Skipped printing memory information for 22 layers with 0 memory size i.e. Host Persistent + Device Persistent + Scratch Memory == 0.
[05/15/2024-20:44:50] [I] [TRT] Total Host Persistent Memory: 45280
[05/15/2024-20:44:50] [I] [TRT] Total Device Persistent Memory: 0
[05/15/2024-20:44:50] [I] [TRT] Total Scratch Memory: 0
[05/15/2024-20:44:50] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 132 MiB
[05/15/2024-20:44:50] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 29 steps to complete.
[05/15/2024-20:44:50] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.337024ms to assign 7 blocks to 29 nodes requiring 126464 bytes.
[05/15/2024-20:44:50] [V] [TRT] Total number of blocks in optimized block assignment: 7
[05/15/2024-20:44:50] [I] [TRT] Total Activation Memory: 126464
[05/15/2024-20:44:50] [V] [TRT] Finalize: MatMul_0 Set kernel index: 0
[05/15/2024-20:44:50] [V] [TRT] Finalize: MaxPool_5 Set kernel index: 1
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_12 Set kernel index: 2
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_13 || Gemm_14 Set kernel index: 3
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_15 Set kernel index: 2
[05/15/2024-20:44:50] [V] [TRT] Finalize: PWN(onnx::Div_41 + (Unnamed Layer* 33) [Shuffle], Div_17) Set kernel index: 4
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_19 Set kernel index: 5
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_20 Set kernel index: 6
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_21 Set kernel index: 6
[05/15/2024-20:44:50] [V] [TRT] Total number of generated kernels selected for the engine: 7
[05/15/2024-20:44:50] [V] [TRT] Kernel: 0 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 1 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 2 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 3 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 4 TRT_SERIALIZABLE:generatedNativePointwise
[05/15/2024-20:44:50] [V] [TRT] Kernel: 5 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 6 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Disabling unused tactic source: CUDNN
[05/15/2024-20:44:50] [V] [TRT] Disabling unused tactic source: CUBLAS, CUBLAS_LT
[05/15/2024-20:44:50] [V] [TRT] Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
[05/15/2024-20:44:50] [V] [TRT] Disabling unused tactic source: JIT_CONVOLUTIONS
[05/15/2024-20:44:50] [V] [TRT] Engine generation completed in 10.7422 seconds.
[05/15/2024-20:44:50] [V] [TRT] Deleting timing cache: 141 entries, served 42 hits since creation.
[05/15/2024-20:44:50] [V] [TRT] Engine Layer Information:
Layer(NoOp): reshape_before_MatMul_0, Tactic: 0x0000000000000000, x (Float[12,20,12]) -> reshape_before_MatMul_0_out_tensor (Float[240,12,1,1])
Layer(NoOp): Reformatting CopyNode for Input Tensor 0 to MatMul_0, Tactic: 0x0000000000000000, reshape_before_MatMul_0_out_tensor (Float[240,12,1,1]) -> Reformatted Input Tensor 0 to MatMul_0 (Float[240,12:4,1,1])
Layer(CaskGemmConvolution): MatMul_0, Tactic: 0x00000000000201d1, Reformatted Input Tensor 0 to MatMul_0 (Float[240,12:4,1,1]) -> MatMul_0_out_tensor (Float[240,64:4,1,1])
Layer(NoOp): Reformatting CopyNode for Input Tensor 0 to reshape_after_MatMul_0, Tactic: 0x0000000000000000, MatMul_0_out_tensor (Float[240,64:4,1,1]) -> Reformatted Input Tensor 0 to reshape_after_MatMul_0 (Float[240,64,1,1])
Layer(NoOp): reshape_after_MatMul_0, Tactic: 0x0000000000000000, Reformatted Input Tensor 0 to reshape_after_MatMul_0 (Float[240,64,1,1]) -> onnx::Add_25 (Float[12,20,64])
Layer(Constant): backbone.subgraph.linear.bias + (Unnamed Layer* 4) [Shuffle], Tactic: 0x0000000000000000,  -> (Unnamed Layer* 4) [Shuffle]_output (Float[1,1,64])
Layer(ElementWise): Add_1, Tactic: 0x0000000000000001, (Unnamed Layer* 4) [Shuffle]_output (Float[1,1,64]), onnx::Add_25 (Float[12,20,64]) -> input (Float[12,20,64])
Layer(NoOp): (Unnamed Layer* 6) [Shuffle], Tactic: 0x0000000000000000, input (Float[12,20,64]) -> (Unnamed Layer* 6) [Shuffle]_output (Float[12,20,64,1])
Layer(Scale): BatchNormalization_2 + Relu_3, Tactic: 0x0000000000000000, (Unnamed Layer* 6) [Shuffle]_output (Float[12,20,64,1]) -> Relu_3_out_tensor (Float[12,20,64,1])
Layer(NoOp): squeeze_after_Relu_3, Tactic: 0x0000000000000000, Relu_3_out_tensor (Float[12,20,64,1]) -> squeeze_after_Relu_3_out_tensor (Float[12,20,64])
Layer(Shuffle): Transpose_4 + (Unnamed Layer* 11) [Shuffle], Tactic: 0x0000000000000000, squeeze_after_Relu_3_out_tensor (Float[12,20,64]) -> (Unnamed Layer* 11) [Shuffle]_output (Float[12,64,20,1])
Layer(CaskPooling): MaxPool_5, Tactic: 0x5faf4a0a8a5670ed, (Unnamed Layer* 11) [Shuffle]_output (Float[12,64,20,1]) -> (Unnamed Layer* 12) [Pooling]_output (Float[12,64,1,1])
Layer(NoOp): (Unnamed Layer* 13) [Shuffle] + Squeeze_6, Tactic: 0x0000000000000000, (Unnamed Layer* 12) [Pooling]_output (Float[12,64,1,1]) -> x.1 (Float[12,64])
Layer(Reformat): reshape_before_Gemm_12_copy_input, Tactic: 0x00000000000003e8, x.1 (Float[1,64]) -> reshape_before_Gemm_12_copy_input (Float[1,64])
Layer(NoOp): reshape_before_Gemm_12, Tactic: 0x0000000000000000, reshape_before_Gemm_12_copy_input (Float[1,64]) -> reshape_before_Gemm_12_out_tensor (Float[1,64,1,1])
Layer(CaskGemmConvolution): Gemm_12, Tactic: 0x000000000002034f, reshape_before_Gemm_12_out_tensor (Float[1,64,1,1]) -> Gemm_12_out_tensor (Float[1,32,1,1])
Layer(NoOp): reshape_after_Gemm_12, Tactic: 0x0000000000000000, Gemm_12_out_tensor (Float[1,32,1,1]) -> onnx::Gemm_37 (Float[1,32])
Layer(NoOp): reshape_before_Gemm_13, Tactic: 0x0000000000000000, x.1 (Float[12,64]) -> reshape_before_Gemm_13_out_tensor (Float[12,64,1,1])
Layer(CaskGemmConvolution): Gemm_13 || Gemm_14, Tactic: 0x00000000000204df, reshape_before_Gemm_13_out_tensor (Float[12,64,1,1]) -> Gemm_13 || Gemm_14 (Float[12,64,1,1])
Layer(Reformat): reshape_after_Gemm_13_copy_input, Tactic: 0x00000000000003e8, Gemm_13 || Gemm_14 (Float[12,32,1,1]) -> reshape_after_Gemm_13_copy_input (Float[12,32,1,1])
Layer(NoOp): reshape_after_Gemm_13, Tactic: 0x0000000000000000, reshape_after_Gemm_13_copy_input (Float[12,32,1,1]) -> onnx::Gemm_38 (Float[12,32])
Layer(Reformat): reshape_after_Gemm_14_copy_input, Tactic: 0x00000000000003e8, Gemm_13 || Gemm_14 (Float[12,32,1,1]) -> reshape_after_Gemm_14_copy_input (Float[12,32,1,1])
Layer(NoOp): reshape_after_Gemm_14, Tactic: 0x0000000000000000, reshape_after_Gemm_14_copy_input (Float[12,32,1,1]) -> onnx::Gemm_39 (Float[12,32])
Layer(CaskGemmMatrixMultiply): Gemm_15, Tactic: 0x000000000002034f, onnx::Gemm_37 (Float[1,32]), onnx::Gemm_38 (Float[12,32]) -> onnx::Div_40 (Float[1,12])
Layer(PointWiseV2): PWN(onnx::Div_41 + (Unnamed Layer* 33) [Shuffle], Div_17), Tactic: 0x000000000000001c, onnx::Div_40 (Float[1,12]) -> scores (Float[1,12])
Layer(CudaSoftMax): Softmax_18, Tactic: 0x00000000000003e9, scores (Float[1,12]) -> (Unnamed Layer* 36) [Softmax]_output (Float[1,12])
Layer(CaskGemmMatrixMultiply): Gemm_19, Tactic: 0x00000000000203be, (Unnamed Layer* 36) [Softmax]_output (Float[1,12]), onnx::Gemm_39 (Float[12,32]) -> onnx::Gemm_44 (Float[1,32])
Layer(NoOp): reshape_before_Gemm_20, Tactic: 0x0000000000000000, onnx::Gemm_44 (Float[1,32]) -> reshape_before_Gemm_20_out_tensor (Float[1,32,1,1])
Layer(CaskGemmConvolution): Gemm_20, Tactic: 0x000000000002014b, reshape_before_Gemm_20_out_tensor (Float[1,32,1,1]) -> Gemm_20_out_tensor (Float[1,32,1,1])
Layer(CaskGemmConvolution): Gemm_21, Tactic: 0x000000000002014b, Gemm_20_out_tensor (Float[1,32,1,1]) -> Gemm_21_out_tensor (Float[1,30,1,1])
Layer(NoOp): reshape_after_Gemm_21, Tactic: 0x0000000000000000, Gemm_21_out_tensor (Float[1,30,1,1]) -> reg (Float[1,30])
[05/15/2024-20:44:50] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
[05/15/2024-20:44:50] [E] Saving engine to file failed.
[05/15/2024-20:44:50] [E] Engine set up failed

Please check and have a nice day

JYS997760473 · 2024-05-16T02:52:45Z

And if I remove the LayerNorm or BatchNormlization block, can success to generate the engine file.

lix19937 · 2024-05-17T08:28:59Z

You can try to convert these two modules(LayerNorm or BatchNormlization block as a subgraph onnx) separately.

zerollzeng · 2024-05-17T12:29:44Z

[05/15/2024-20:44:50] [E] Saving engine to file failed.

no disk space?

JYS997760473 · 2024-05-17T12:34:26Z

[05/15/2024-20:44:50] [E] Saving engine to file failed.

no disk space?

Hi, thanks for your reply. I tried again with new .pt, and success to create the engine file.
And there is one more thing I would like to make clear, is that up till now, we cannot use LayerNormalization operator on Orin DRIVE unless write a TensorRT Plugin by myself?

zerollzeng · 2024-05-19T03:18:03Z

Please check our release note, I think you need at least TRT 8.6 or 9.0, can't remember exactly which one.

zerollzeng self-assigned this May 17, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting engine file from onnx file with ReduceMax failure of TensorRT 8.5.10 when running trtexec on GPU Orin #3866

Converting engine file from onnx file with ReduceMax failure of TensorRT 8.5.10 when running trtexec on GPU Orin #3866

JYS997760473 commented May 15, 2024 •

edited

lix19937 commented May 16, 2024

JYS997760473 commented May 16, 2024

JYS997760473 commented May 16, 2024

lix19937 commented May 17, 2024 •

edited

zerollzeng commented May 17, 2024

JYS997760473 commented May 17, 2024

zerollzeng commented May 19, 2024

Converting engine file from onnx file with ReduceMax failure of TensorRT 8.5.10 when running trtexec on GPU Orin #3866

Converting engine file from onnx file with ReduceMax failure of TensorRT 8.5.10 when running trtexec on GPU Orin #3866

Comments

JYS997760473 commented May 15, 2024 • edited

Description

Environment

Relevant Files

Steps To Reproduce

lix19937 commented May 16, 2024

JYS997760473 commented May 16, 2024

JYS997760473 commented May 16, 2024

lix19937 commented May 17, 2024 • edited

zerollzeng commented May 17, 2024

JYS997760473 commented May 17, 2024

zerollzeng commented May 19, 2024

JYS997760473 commented May 15, 2024 •

edited

lix19937 commented May 17, 2024 •

edited