Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant build python+onnx+ternsorrtllm backends r24.04 #7236

Open
gulldan opened this issue May 17, 2024 · 3 comments
Open

Cant build python+onnx+ternsorrtllm backends r24.04 #7236

gulldan opened this issue May 17, 2024 · 3 comments
Labels
investigating The developement team is investigating this issue

Comments

@gulldan
Copy link

gulldan commented May 17, 2024

Im trying https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/compose.md
to build onnx+python+tensorrtllm backends.

as mention in doc i do

git clone --single-branch --depth=1 -b r24.04 https://github.com/triton-inference-server/server.git
python3 compose.py --backend onnxruntime --backend python --repoagent checksum --image min,nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3 --image full,nvcr.io/nvidia/tritonserver:24.04-py3

and it builds, but when i start triton server.

E0517 12:18:34.314931 164 model_lifecycle.cc:638] failed to load 'llama3_tensorrt_llm' version 1: Invalid argument: unable to find backend library for backend 'tensorrtllm', try specifying runtime on the mode configuration.

models with python, onnx loads correct.

How i can combine docker image for using both backends?

python3 compose.py --backend tensorrtllm --backend python --backend onnxruntime --repoagent checksum --container-version 24.04

failed, tensorrt not found

=> CACHED [stage-1 16/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/include include/                                                                                                               0.0s
 => CACHED [stage-1 17/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/python /opt/tritonserver/backends/python                                                                              0.0s
 => CACHED [stage-1 18/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/onnxruntime /opt/tritonserver/backends/onnxruntime                                                                    0.0s
 => ERROR [stage-1 19/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/tensorrtllm /opt/tritonserver/backends/tensorrtllm
@statiraju statiraju added the investigating The developement team is investigating this issue label May 17, 2024
@statiraju
Copy link

Tracking ticket: [DLIS-6397]

@rmccorm4
Copy link
Collaborator

rmccorm4 commented May 17, 2024

Hi @gulldan, compose.py doesn't currently support the TensorRT-LLM backend (DLIS-6397).

You should be able to achieve something similar by using build.py with:

--backend tensorrtllm:r24.04
--backend python:r24.04
--backend onnxruntime:r24.04

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/build.html#building-with-docker

Let us know if this helps for your use case.

@gulldan
Copy link
Author

gulldan commented May 18, 2024

thank you.

i tried

./build.py --backend tensorrtllm:r24.04 --backend python:r24.04 --backend onnxruntime:r24.04 --enable-gpu --build-type Release --target-platform linux --endpoint grpc --endpoint http

but its failed
build_log.txt

Host info
Linux 6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 26 11:23:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Docker version 26.1.2, build 211e74b
cmake version 3.28.4
python 3.11.6
GeForce RTX 4090
Driver Version: 550.54.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating The developement team is investigating this issue
Development

No branches or pull requests

3 participants