Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run InferenceService on a local cluster #3689

Closed
yurkoff-mv opened this issue May 14, 2024 · 12 comments
Closed

Unable to run InferenceService on a local cluster #3689

yurkoff-mv opened this issue May 14, 2024 · 12 comments
Labels

Comments

@yurkoff-mv
Copy link

/kind bug

What steps did you take and what happened:
I have a local cluster without internet access. Manifests version 1.8 is deployed on it. I deployed this version using images imported as tar files. I also imported the image for InferenceService as a tar file. However, the service does not start. If you run the command microk8s kubectl describe inferenceservices -n kubeflow-namespace llm, you may see the following error message:

Revision "llm -predictor-00001" failed with message: Unable to fetch image "yurkoff/torchserve-kfs:0.9.0-gpu": failed to resolve image to digest: Get "https://index.docker.io/v2 /": read tcp 10.1.22.219:48238->54.198.86.24:443: read: connection reset by peer.

Moreover, the image is present in microk8s ctr...
microk8s ctr images list | grep yurkoff

docker.io/yurkoff/torchserve-kfs:0.9.0-gpu                                                                                                     application/vnd.docker.distribution.manifest.v2+json      sha256:1b771d7c0c2d26f78e892997cb00e6051c77cf3654827c4715aa5a502267ee76 5.7 GiB    linux/amd64                                                                                             io.cri-containerd.image=managed

In machine with internet:

microk8s ctr images pull docker.io/yurkoff/torchserve-kfs:0.9.0-gpu
microk8s ctr images export yurkoff_torchserve-kfs_0.9.0-gpu.tar docker.io/yurkoff/torchserve-kfs:0.9.0-gpu

In local machine without internet:

microk8s ctr images import yurkoff_torchserve-kfs_0.9.0-gpu.tar
microk8s kubectl apply -f llm_isvc.yaml

What did you expect to happen:
Successful deployment of InferenceService

What's the InferenceService yaml:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: "llm"
  namespace: "kubeflow-namespace"
spec:
  predictor:
    pytorch:
      protocolVersion: v1
      runtimeVersion: "0.9.0-gpu"
      image: "yurkoff/torchserve-kfs:0.9.0-gpu"
      imagePullPolicy: "Never"
      storageUri: pvc://torchserve-claim/llm
      resources:
        requests:
          cpu: "2"
          memory: 16Gi
          nvidia.com/gpu: "1"
        limits:
          cpu: "4"
          memory: 30Gi
          nvidia.com/gpu: "1"
    minReplicas: 1
    maxReplicas: 1
    timeout: 180

Please note that I specifically set imagePullPolicy: "Never"

Anything else you would like to add:
I would like to note that KubeFlow with local images was successfully deployed.
It turns out that InferenceService, even if there is an image in the ctr-storage, tries to gain access from the outside. The option imagePullPolicy: "Never" doesn't work.

Environment:

  • Istio Version: 1.17.3
  • Knative Version: 1.10.2
  • KServe Version: 0.11.2
  • Kubeflow version: 1.8
  • Kubernetes version: (use kubectl version): MicroK8S 1.28
  • OS (e.g. from /etc/os-release): Ubuntu 20.04
@spolti
Copy link
Contributor

spolti commented May 14, 2024

Hi, I never used microk8s before, but there are a few things that might be causing it:

First, shouldn't you use the complete image name instead just yurkoff/torchserve-kfs:0.9.0-gpu?

Secondly, this looks strange:

"https://index.docker.io/v2 /"

notice the space in the last /

you might need to investigate why is this API address having the extra space at the end.

@yurkoff-mv
Copy link
Author

yurkoff-mv commented May 15, 2024

Hello!
Thanks for the answer.
There is no space there, apparently it was copied incorrectly from Linux. I tried using the full name (docker.io/yurkoff/torchserve-kfs:0.9.0-gpu) too.

Revision "llm-predictor-00001" failed with message: Unable to fetch image "docker.io/yurkoff/torchserve-kfs:0.9.0-gpu": failed to resolve image to digest: Get "https://index.docker.io/v2/": read tcp 10.1.22.219:40004->54.236.113.205:443: read: connection reset by peer.

Interestingly, KubeFlow is automatically deployed from the local images, but the InferenceService is not possible.

@spolti
Copy link
Contributor

spolti commented May 15, 2024

you might need to do this in your isvc: https://kserve.github.io/website/0.11/modelserving/v1beta1/custom/custom_model/#deploy-the-rest-custom-serving-runtime-on-kserve
Using SHA might be helpful as well.
The podSpec is exposed in the isvc as inline, so any PodSpec field would be available like the example above.

@yurkoff-mv
Copy link
Author

I don’t quite understand what exactly I need to do? I compiled the image in Docker. It is successfully downloaded and deployed in a cluster with Internet access. From this cluster I export the image as a tar file. I import the resulting image into the cluster without the Internet. For some reason, InferenceService thinks that it does not exist and tries to download it. If you create a Deployment, it considers that the image is present.

@spolti
Copy link
Contributor

spolti commented May 15, 2024

See the inference service structure from the link I sent you. PullImagePolicy and the container is a property from the containers field.

@yurkoff-mv
Copy link
Author

Sorry, but I didn't find any mention of imagePullPolicy in the link provided. However, this parameter is in the description of V1beta1TorchServeSpec.

@yurkoff-mv
Copy link
Author

yurkoff-mv commented May 16, 2024

I tried to organize a local registry. I uploaded my image yurkoff/torchserve-kfs:0.9.0-gpu there, but I get the following error:

Message:               Revision "llm-predictor-00001" failed with message: Unable to fetch image "127.0.0.1:32000/yurkoff/torchserve-kfs:0.9.0-gpu": failed to resolve image to digest: Get "https://127.0.0.1:32000/v2/": dial tcp 127.0.0.1:32000: connect: connection refused; Get "http://127.0.0.1:32000/v2/": dial tcp 127.0.0.1:32000: connect: connection refused.

Given that the registry is available.
curl -v http://127.0.0.1:32000/v2/

*   Trying 127.0.0.1:32000...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 32000 (#0)
> GET /v2/ HTTP/1.1
> Host: 127.0.0.1:32000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 2
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Date: Thu, 16 May 2024 11:03:13 GMT
< 
{}
* Connection #0 to host 127.0.0.1 left intact

curl -v http://127.0.0.1:32000/v2/_catalog

*   Trying 127.0.0.1:32000...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 32000 (#0)
> GET /v2/_catalog HTTP/1.1
> Host: 127.0.0.1:32000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Date: Thu, 16 May 2024 12:00:45 GMT
< Content-Length: 44
< 
{"repositories":["yurkoff/torchserve-kfs"]}
* Connection #0 to host 127.0.0.1 left intact

I can’t understand what information InferenceService wants to receive from outside if everything is available locall

@spolti
Copy link
Contributor

spolti commented May 17, 2024

Hi, what I meant was to use this structure:

spec:
  predictor:
    containers:
      image: xxx
      name: kserve-container
      ports: xxx

or you can define it in your custom Serving Runtime as well.

@yurkoff-mv
Copy link
Author

yurkoff-mv commented May 21, 2024

Hi, @spolti !
I tried this, same result.
My yaml-file:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: "llm"
  namespace: "kubeflow-megaputer"
spec:
  predictor:
    containers:
      - name: kserve-container
        image: "yurkoff/torchserve-kfs:0.9.0-gpu"
        imagePullPolicy: IfNotPresent
#        storageUri: pvc://torchserve-claim/llm
        env:
          - name: STORAGE_URI
            value: pvc://torchserve-claim/llm
        resources:
          requests:
            cpu: "2"
            memory: 16Gi
            nvidia.com/gpu: "1"
          limits:
            cpu: "4"
            memory: 24Gi
            nvidia.com/gpu: "1"

@israel-hdez
Copy link
Contributor

Looks like you are using KServe serverless mode, which uses Knative.

Knative always tries to resolve image tags to digests, which is an operation that requires access to the registry (reference: https://knative.dev/docs/serving/tag-resolution/)

Thus, you may want your try using the digest of your image in the InferenceServices, instead of 0.9.0-gpu.

@spolti
Copy link
Contributor

spolti commented May 23, 2024

Nice @israel-hdez , didn't spot it :D

@yurkoff-mv
Copy link
Author

yurkoff-mv commented May 27, 2024

Hi, @israel-hdez, @spolti! Thanks a lot! This works for me!
I edited the ConfigMap config-deployment

microk8s kubectl edit configmap config-deployment -n knative-serving

by adding the following line:

registries-skipping-tag-resolving: "kind.local,ko.local,dev.local,index.docker.io"

and the local image was successfully applied in InferenceService.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants