Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRT8 -> TRT10, severe performance degradation. #3853

Open
Mr-Nineteen opened this issue May 10, 2024 · 3 comments
Open

TRT8 -> TRT10, severe performance degradation. #3853

Mr-Nineteen opened this issue May 10, 2024 · 3 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@Mr-Nineteen
Copy link

Mr-Nineteen commented May 10, 2024

  • Verified model: Search and recommendation model.

  • Inference time for TRT8 model:

latency milliseconds
50% 12.860
60% 13.192
70% 13.513
80% 14.229
90% 15.736
95% 16.618
99% 19.204
  • Inference time for TRT10 model:
latency milliseconds
50% 37.349
60% 39.082
70% 41.100
80% 43.191
90% 46.483
95% 49.919
99% 56.780
  • Analyze the main time-consuming part:
    // Prepare inputs
    for (size_t i = 0; i < inputs.size(); ++i) {
      const TFTensor& input = inputs[i];
      const std::string& input_name = input_names_[i];
      if (!context_->setTensorAddress(input_name.c_str(), input.data())) {
        return tf::errors::Internal(
            carbon::Printf("Failed to `setTensorAddress` for input name [%s]",
                           input_name.c_str()));
      }
      nvinfer1::Dims dims;
      TensorShapeToDims(input.shape(), &dims);
      if (!context_->setInputShape(input_name.c_str(), dims)) {
        return tf::errors::Internal(carbon::Printf(
            "Failed to `setInputShape` for name [%s]", input_name.c_str()));
      }
    }

It is this setInputShape API that causes the increase in time consumption. If there are nearly a thousand inputs, the time consumption increases by tens of milliseconds.

@lix19937
Copy link

Why not use trtexec for benchmark ?

@Mr-Nineteen
Copy link
Author

@lix19937

TRT is integrated into a self-developed framework, with its own set of benchmark tests and verification processes.

This issue is solely due to the upgrade to TRT10.

@zerollzeng
Copy link
Collaborator

Could you please provide a reproduce? @Mr-Nineteen

@zerollzeng zerollzeng self-assigned this May 12, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants