Unable to run InferenceService on a local cluster #3689

yurkoff-mv · 2024-05-14T09:40:56Z

/kind bug

What steps did you take and what happened:
I have a local cluster without internet access. Manifests version 1.8 is deployed on it. I deployed this version using images imported as tar files. I also imported the image for InferenceService as a tar file. However, the service does not start. If you run the command microk8s kubectl describe inferenceservices -n kubeflow-namespace llm, you may see the following error message:

Revision "llm -predictor-00001" failed with message: Unable to fetch image "yurkoff/torchserve-kfs:0.9.0-gpu": failed to resolve image to digest: Get "https://index.docker.io/v2 /": read tcp 10.1.22.219:48238->54.198.86.24:443: read: connection reset by peer.

Moreover, the image is present in microk8s ctr...
microk8s ctr images list | grep yurkoff

docker.io/yurkoff/torchserve-kfs:0.9.0-gpu                                                                                                     application/vnd.docker.distribution.manifest.v2+json      sha256:1b771d7c0c2d26f78e892997cb00e6051c77cf3654827c4715aa5a502267ee76 5.7 GiB    linux/amd64                                                                                             io.cri-containerd.image=managed

In machine with internet:

microk8s ctr images pull docker.io/yurkoff/torchserve-kfs:0.9.0-gpu
microk8s ctr images export yurkoff_torchserve-kfs_0.9.0-gpu.tar docker.io/yurkoff/torchserve-kfs:0.9.0-gpu

In local machine without internet:

microk8s ctr images import yurkoff_torchserve-kfs_0.9.0-gpu.tar
microk8s kubectl apply -f llm_isvc.yaml

What did you expect to happen:
Successful deployment of InferenceService

What's the InferenceService yaml:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: "llm"
  namespace: "kubeflow-namespace"
spec:
  predictor:
    pytorch:
      protocolVersion: v1
      runtimeVersion: "0.9.0-gpu"
      image: "yurkoff/torchserve-kfs:0.9.0-gpu"
      imagePullPolicy: "Never"
      storageUri: pvc://torchserve-claim/llm
      resources:
        requests:
          cpu: "2"
          memory: 16Gi
          nvidia.com/gpu: "1"
        limits:
          cpu: "4"
          memory: 30Gi
          nvidia.com/gpu: "1"
    minReplicas: 1
    maxReplicas: 1
    timeout: 180

Please note that I specifically set imagePullPolicy: "Never"

Anything else you would like to add:
I would like to note that KubeFlow with local images was successfully deployed.
It turns out that InferenceService, even if there is an image in the ctr-storage, tries to gain access from the outside. The option imagePullPolicy: "Never" doesn't work.

Environment:

Istio Version: 1.17.3
Knative Version: 1.10.2
KServe Version: 0.11.2
Kubeflow version: 1.8
Kubernetes version: (use kubectl version): MicroK8S 1.28
OS (e.g. from /etc/os-release): Ubuntu 20.04

The text was updated successfully, but these errors were encountered:

spolti · 2024-05-14T16:09:02Z

Hi, I never used microk8s before, but there are a few things that might be causing it:

First, shouldn't you use the complete image name instead just yurkoff/torchserve-kfs:0.9.0-gpu?

Secondly, this looks strange:

"https://index.docker.io/v2 /"

notice the space in the last /

you might need to investigate why is this API address having the extra space at the end.

yurkoff-mv · 2024-05-15T15:54:53Z

Hello!
Thanks for the answer.
There is no space there, apparently it was copied incorrectly from Linux. I tried using the full name (docker.io/yurkoff/torchserve-kfs:0.9.0-gpu) too.

Revision "llm-predictor-00001" failed with message: Unable to fetch image "docker.io/yurkoff/torchserve-kfs:0.9.0-gpu": failed to resolve image to digest: Get "https://index.docker.io/v2/": read tcp 10.1.22.219:40004->54.236.113.205:443: read: connection reset by peer.

Interestingly, KubeFlow is automatically deployed from the local images, but the InferenceService is not possible.

spolti · 2024-05-15T16:24:17Z

you might need to do this in your isvc: https://kserve.github.io/website/0.11/modelserving/v1beta1/custom/custom_model/#deploy-the-rest-custom-serving-runtime-on-kserve
Using SHA might be helpful as well.
The podSpec is exposed in the isvc as inline, so any PodSpec field would be available like the example above.

yurkoff-mv · 2024-05-15T16:51:39Z

I don’t quite understand what exactly I need to do? I compiled the image in Docker. It is successfully downloaded and deployed in a cluster with Internet access. From this cluster I export the image as a tar file. I import the resulting image into the cluster without the Internet. For some reason, InferenceService thinks that it does not exist and tries to download it. If you create a Deployment, it considers that the image is present.

spolti · 2024-05-15T17:49:34Z

See the inference service structure from the link I sent you. PullImagePolicy and the container is a property from the containers field.

yurkoff-mv · 2024-05-15T18:13:01Z

Sorry, but I didn't find any mention of imagePullPolicy in the link provided. However, this parameter is in the description of V1beta1TorchServeSpec.

yurkoff-mv · 2024-05-16T12:05:04Z

I tried to organize a local registry. I uploaded my image yurkoff/torchserve-kfs:0.9.0-gpu there, but I get the following error:

Message:               Revision "llm-predictor-00001" failed with message: Unable to fetch image "127.0.0.1:32000/yurkoff/torchserve-kfs:0.9.0-gpu": failed to resolve image to digest: Get "https://127.0.0.1:32000/v2/": dial tcp 127.0.0.1:32000: connect: connection refused; Get "http://127.0.0.1:32000/v2/": dial tcp 127.0.0.1:32000: connect: connection refused.

Given that the registry is available.
curl -v http://127.0.0.1:32000/v2/

*   Trying 127.0.0.1:32000...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 32000 (#0)
> GET /v2/ HTTP/1.1
> Host: 127.0.0.1:32000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 2
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Date: Thu, 16 May 2024 11:03:13 GMT
< 
{}
* Connection #0 to host 127.0.0.1 left intact

curl -v http://127.0.0.1:32000/v2/_catalog

*   Trying 127.0.0.1:32000...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 32000 (#0)
> GET /v2/_catalog HTTP/1.1
> Host: 127.0.0.1:32000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Date: Thu, 16 May 2024 12:00:45 GMT
< Content-Length: 44
< 
{"repositories":["yurkoff/torchserve-kfs"]}
* Connection #0 to host 127.0.0.1 left intact

I can’t understand what information InferenceService wants to receive from outside if everything is available locall

spolti · 2024-05-17T14:01:15Z

Hi, what I meant was to use this structure:

spec:
  predictor:
    containers:
      image: xxx
      name: kserve-container
      ports: xxx

or you can define it in your custom Serving Runtime as well.

yurkoff-mv · 2024-05-21T11:29:29Z

Hi, @spolti !
I tried this, same result.
My yaml-file:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: "llm"
  namespace: "kubeflow-megaputer"
spec:
  predictor:
    containers:
      - name: kserve-container
        image: "yurkoff/torchserve-kfs:0.9.0-gpu"
        imagePullPolicy: IfNotPresent
#        storageUri: pvc://torchserve-claim/llm
        env:
          - name: STORAGE_URI
            value: pvc://torchserve-claim/llm
        resources:
          requests:
            cpu: "2"
            memory: 16Gi
            nvidia.com/gpu: "1"
          limits:
            cpu: "4"
            memory: 24Gi
            nvidia.com/gpu: "1"

israel-hdez · 2024-05-23T05:07:46Z

Looks like you are using KServe serverless mode, which uses Knative.

Knative always tries to resolve image tags to digests, which is an operation that requires access to the registry (reference: https://knative.dev/docs/serving/tag-resolution/)

Thus, you may want your try using the digest of your image in the InferenceServices, instead of 0.9.0-gpu.

spolti · 2024-05-23T14:02:20Z

Nice @israel-hdez , didn't spot it :D

yurkoff-mv · 2024-05-27T09:31:24Z

Hi, @israel-hdez, @spolti! Thanks a lot! This works for me!
I edited the ConfigMap config-deployment

microk8s kubectl edit configmap config-deployment -n knative-serving

by adding the following line:

registries-skipping-tag-resolving: "kind.local,ko.local,dev.local,index.docker.io"

and the local image was successfully applied in InferenceService.

oss-prow-bot bot added the kind/bug label May 14, 2024

yurkoff-mv closed this as completed May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to run InferenceService on a local cluster #3689

Unable to run InferenceService on a local cluster #3689

yurkoff-mv commented May 14, 2024

spolti commented May 14, 2024

yurkoff-mv commented May 15, 2024 •

edited

spolti commented May 15, 2024 •

edited

yurkoff-mv commented May 15, 2024

spolti commented May 15, 2024

yurkoff-mv commented May 15, 2024

yurkoff-mv commented May 16, 2024 •

edited

spolti commented May 17, 2024

yurkoff-mv commented May 21, 2024 •

edited

israel-hdez commented May 23, 2024

spolti commented May 23, 2024

yurkoff-mv commented May 27, 2024 •

edited

Unable to run InferenceService on a local cluster #3689

Unable to run InferenceService on a local cluster #3689

Comments

yurkoff-mv commented May 14, 2024

spolti commented May 14, 2024

yurkoff-mv commented May 15, 2024 • edited

spolti commented May 15, 2024 • edited

yurkoff-mv commented May 15, 2024

spolti commented May 15, 2024

yurkoff-mv commented May 15, 2024

yurkoff-mv commented May 16, 2024 • edited

spolti commented May 17, 2024

yurkoff-mv commented May 21, 2024 • edited

israel-hdez commented May 23, 2024

spolti commented May 23, 2024

yurkoff-mv commented May 27, 2024 • edited

yurkoff-mv commented May 15, 2024 •

edited

spolti commented May 15, 2024 •

edited

yurkoff-mv commented May 16, 2024 •

edited

yurkoff-mv commented May 21, 2024 •

edited

yurkoff-mv commented May 27, 2024 •

edited