Module: Deploying Fine-Tuned Model with Ray and Amazon S3

Objective

In this module, we will walk through the steps required to deploy a Ray service that utilizes our previously fine-tuned model, using the pre-signed url generated before.

Deploy RayService to Kubernetes Cluster

Step 1: Update the `working_dir` in RayService Manifest

Objective: To provide the Ray service with the location from which it should fetch the serving script.

Navigate to the directory containing Ray manifests:

cd ray_serve_manifests

Update the working_dir with the previously generated Presigned URLand save it:

serveConfigV2: |
    applications:
      - name: falcon_finetuned_financial_data
        import_path: serve_script.deployment_finetuned 
        route_prefix: /falcon_finetuned_financial
        runtime_env:
          working_dir: "PASTE YOUR PREVIOUSLY GENERATED PRESIGNED URL HERE"

Step 2: Deploy the RayService

Apply the updated YAML manifest to deploy the Ray service:

kubectl apply -f 00_ray_serve_fine_tuned.yaml

Verify Deployment

After applying the manifest, you can verify the status of the RayService, and explore its corresponding resources:

Check the RayService:

kubectl get rayservices -n ray-svc-finetuned

Check the Pods:

kubectl get pods -n ray-svc-finetuned

Check the Node Provisioning:

kubectl get nodes -l provisioner=gpu-serve

Check NVIDIA GPU Operator:

kubectl get pods -n gpu-operator

Wait for Resources to Initialize:

kubectl get pods -n ray-svc-finetuned -w

It can take a while to initialize because of the dependencies (GPU Operator and pulling Ray Images)

Verifying the Deployed Application

Check if the Service is Running

To ensure that the service exposing our non-fine-tuned model is active and running, execute the following command:

kubectl get svc -n ray-svc-finetuned

Expected Output:

You should see an output similar to the following:

ray-svc-finetuned-head-svc                    ClusterIP   172.20.214.232   <none>        10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP   36m
ray-svc-finetuned-raycluster-kmfjg-head-svc   ClusterIP   172.20.37.172    <none>        10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP   52m
ray-svc-finetuned-serve-svc                   ClusterIP   172.20.91.55     <none>        8000/TCP

The service named ray-svc-finetuned-serve-svc should be visible and listen on port 8000/TCP.

Access the Service

For the sake of this demonstration, we are not using a LoadBalancer or Ingress to expose the service. Instead, we'll use the port-forward command to route traffic to the service.

Note: Open a new terminal for this step.

kubectl port-forward svc/ray-svc-finetuned-serve-svc 8000:8000 -n ray-svc-finetuned

Interacting with the Deployed Model

Set Up the Chatbot Application

We have prepared a sample chatbot application that will interact with our deployed model.

Navigate to the sample application directory:

cd ../sample_app

Install Python dependencies:

pip install -r requirements.txt

Define the model endpoint as an environment variable:

export MODEL_ENDPOINT="/falcon_finetuned_financial"

Run the Application

With the environment variable set, you can now run the chatbot application.

python app.py

Access the Chatbot

Open your web browser and navigate to http://127.0.0.1:7860 to start interacting with your deployed model.

Let's ask the same question that we asked for the non-finetuned model

Question: Why do I need an emergency fund if I already have investments?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2-serving-finetuned-model.md

2-serving-finetuned-model.md

Module: Deploying Fine-Tuned Model with Ray and Amazon S3

Objective

Deploy RayService to Kubernetes Cluster

Step 1: Update the `working_dir` in RayService Manifest

Step 2: Deploy the RayService

Verify Deployment

Verifying the Deployed Application

Check if the Service is Running

Access the Service

Interacting with the Deployed Model

Set Up the Chatbot Application

Run the Application

Access the Chatbot

Files

2-serving-finetuned-model.md

Latest commit

History

2-serving-finetuned-model.md

File metadata and controls

Module: Deploying Fine-Tuned Model with Ray and Amazon S3

Objective

Deploy RayService to Kubernetes Cluster

Step 1: Update the working_dir in RayService Manifest

Step 2: Deploy the RayService

Verify Deployment

Verifying the Deployed Application

Check if the Service is Running

Access the Service

Interacting with the Deployed Model

Set Up the Chatbot Application

Run the Application

Access the Chatbot

Step 1: Update the `working_dir` in RayService Manifest