Skip to content

Achiwilms/NVIDIA-Triton-Deployment-Quickstart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

NVIDIA-Triton-Deployment-Quickstart

This README provides a guide for deploying a basic ResNet model (ONNX format) on the Triton Inference Server.

This quickstart guide is an extended version of the official tutorial available at triton-inference-server/tutorials/Quick_Deploy/ONNX/README.md. The official tutorial might be a bit succinct, especially for those new to the Triton Inference Server, so this guide aims to offer more detailed steps to make the deployment process more accessible.

If you're using Linux or MacOS, you can follow this quickstart using your terminal. However, for Windows OS users, please note that CMD will not work. Instead, you should use Windows PowerShell.

Step 0: Install Docker

Follow the Docker installation instructions that are tailored to your particular operating system. You can access comprehensive step-by-step guide here.

Step 1: Create Model Repository

To perform inference on your model with Triton, it's necessary to create a model repository.

The structure of the repository should be:

  <model-repository-path>/
    <model-name>/
      [config.pbtxt]
      [<output-labels-file> ...]
      <version>/
        <model-definition-file>
      <version>/
        <model-definition-file>
      ...
    <model-name>/
      [config.pbtxt]
      [<output-labels-file> ...]
      <version>/
        <model-definition-file>
      <version>/
        <model-definition-file>
      ...
    ...

(The config.pbtxt configuration file is optional. The configuration file will be autogenerated by Triton Inference Server if the user doesn't provide it.)

Therefore, the first step is to set up the directory structure for the model repository.

mkdir -p model_repository/densenet_onnx/1

Next, download the example ResNet model available online and place it in the appropriate directory.

wget -O model_repository/densenet_onnx/1/model.onnx "https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx"

Now, by entering the command tree, you will observe the following directory structure, which aligns with the specifications of the model repository structure.

model_repository
|
+-- densenet_onnx
    |
    +-- 1
        |
        +-- model.onnx

Step 2: Set Up Triton Inference Server

Please ensure that your current directory is located one level above the newly created model repository. This arrangement will allow the path ./model_repository to refer to the actual model repository. If your current location is not in the desired location, navigate to that directory.

Next, run the pre-built docker container for Trition Inference Server

docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v "${PWD}/model_repository:/models" nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models

If you encounter a permission error, prepend sudo to the command. In case Triton Inference Server version 23.06 is not available, you can refer to the official release notes to identify the available versions.

Once the Docker image has been successfully pulled and the container is up and running, you should see a significant amount of information displayed. Within, you can find:

+---------------+---------+--------+
| Model         | Version | Status |
+---------------+---------+--------+
| densenet_onnx | 1       | READY  |
+---------------+---------+--------+

This indicates our model has been deployed on the server and is now ready to perform inference.

Since the server needs to be up and running when client queries it, after initiating the server's Docker container, do not close that terminal as it could stop the container.

Step 3: Set Up Triton Client

To set up the client container, please utilize a separate terminal, distinct from the one employed for setting up the Triton server.

Run the pre-built docker container for Trition Client

docker run -it --rm --net=host -v "${PWD}:/workspace/" nvcr.io/nvidia/tritonserver:23.06-py3-sdk bash

If you encounter a permission error, prepend sudo to the command. In case Triton Inference Server version 23.06 is not available, you can refer to the official release notes to identify the available versions.

Once the Docker image has been successfully pulled and the container is up and running, you will find yourself in an interactive Bash shell session within the container.

Install the torchvision package.

pip install torchvision

Download the example photo for conducting inference.

wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"

Step 4: Using a Triton Client to Query the Server

Download the example python script client.py for querying

wget -O client.py "https://raw.githubusercontent.com/Achiwilms/NVIDIA-Triton-Deployment-Quickstart/main/client.py"

Execute the client.py script, then the inference request will be sent.

python client.py

Once the inference process is finished and the results are sent back to the client, the result will be printed. The output format here is <confidence_score>:<classification_index>.

['11.549026:92' '11.232335:14' '7.528014:95' '6.923391:17' '6.576575:88']

To learn more about the request-making process, you can explore the client.py file. The comments within the script provide guidance and explanations that will aid you in navigating through it.


You've successfully deployed a model on the Triton Inference Server. Congratulations! 🎉

On the other hand, if you encounter any challenges at any step, please feel free to contact me for assistance at this email address.