Cant build python+onnx+ternsorrtllm backends r24.04 #7236

gulldan · 2024-05-17T13:04:44Z

Im trying https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/compose.md
to build onnx+python+tensorrtllm backends.

as mention in doc i do

git clone --single-branch --depth=1 -b r24.04 https://github.com/triton-inference-server/server.git
python3 compose.py --backend onnxruntime --backend python --repoagent checksum --image min,nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3 --image full,nvcr.io/nvidia/tritonserver:24.04-py3

and it builds, but when i start triton server.

E0517 12:18:34.314931 164 model_lifecycle.cc:638] failed to load 'llama3_tensorrt_llm' version 1: Invalid argument: unable to find backend library for backend 'tensorrtllm', try specifying runtime on the mode configuration.

models with python, onnx loads correct.

How i can combine docker image for using both backends?

python3 compose.py --backend tensorrtllm --backend python --backend onnxruntime --repoagent checksum --container-version 24.04

failed, tensorrt not found

=> CACHED [stage-1 16/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/include include/                                                                                                               0.0s
 => CACHED [stage-1 17/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/python /opt/tritonserver/backends/python                                                                              0.0s
 => CACHED [stage-1 18/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/onnxruntime /opt/tritonserver/backends/onnxruntime                                                                    0.0s
 => ERROR [stage-1 19/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/tensorrtllm /opt/tritonserver/backends/tensorrtllm

The text was updated successfully, but these errors were encountered:

statiraju · 2024-05-17T18:18:31Z

Tracking ticket: [DLIS-6397]

rmccorm4 · 2024-05-17T18:20:24Z

Hi @gulldan, compose.py doesn't currently support the TensorRT-LLM backend (DLIS-6397).

You should be able to achieve something similar by using build.py with:

--backend tensorrtllm:r24.04
--backend python:r24.04
--backend onnxruntime:r24.04

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/build.html#building-with-docker

Let us know if this helps for your use case.

gulldan · 2024-05-18T07:00:47Z

thank you.

i tried

./build.py --backend tensorrtllm:r24.04 --backend python:r24.04 --backend onnxruntime:r24.04 --enable-gpu --build-type Release --target-platform linux --endpoint grpc --endpoint http

but its failed
build_log.txt

Host info
Linux 6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 26 11:23:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Docker version 26.1.2, build 211e74b
cmake version 3.28.4
python 3.11.6
GeForce RTX 4090
Driver Version: 550.54.15

statiraju added the investigating The developement team is investigating this issue label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cant build python+onnx+ternsorrtllm backends r24.04 #7236

Cant build python+onnx+ternsorrtllm backends r24.04 #7236

gulldan commented May 17, 2024 •

edited

statiraju commented May 17, 2024

rmccorm4 commented May 17, 2024 •

edited

gulldan commented May 18, 2024 •

edited

Cant build python+onnx+ternsorrtllm backends r24.04 #7236

Cant build python+onnx+ternsorrtllm backends r24.04 #7236

Comments

gulldan commented May 17, 2024 • edited

statiraju commented May 17, 2024

rmccorm4 commented May 17, 2024 • edited

gulldan commented May 18, 2024 • edited

gulldan commented May 17, 2024 •

edited

rmccorm4 commented May 17, 2024 •

edited

gulldan commented May 18, 2024 •

edited