Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kserve deployment certificate issue - tls: failed to verify certificate: x509: certificate signed by unknown authority\nError from server (InternalError) #3649

Open
Subhankar-Adak opened this issue Apr 29, 2024 · 3 comments
Labels

Comments

@Subhankar-Adak
Copy link

/kind bug

What steps did you take and what happened:
[I am trying to deploy kserve on bare metal Kubernetes v1.26.12 cluster, but getting certificate validity related issues. ]

Error log:

"/opt/test/kserve/kserve_manifest/kserve_manifest.yaml"], "delta": "0:00:00.675632", "end": "2024-04-27 05:27:25.793010", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2024-04-27 05:27:25.117378", "stderr": "Warning: Detected changes to resource inferenceservices.serving.kserve.io which is currently being deleted.\nError from server (InternalError): error when creating "/opt/test/kserve/kserve_manifest/kserve_manifest.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown authority\nError from server (InternalError): error when creating "/opt/test/kserve/kserve_manifest/kserve_manifest.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown authority", "stderr_lines": ["Warning: Detected changes to resource inferenceservices.serving.kserve.io which is currently being deleted.", "Error from server (InternalError): error when creating "/opt/test/kserve/kserve_manifest/kserve_manifest.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority", "Error from server (InternalError): error when creating "/opt/test/kserve/kserve_manifest/kserve_manifest.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority"], "stdout": "namespace/kserve created

: failed calling webhook \ "webhook.cert-manager. io" :
failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:442/mutate?timeout=IOs\" :
t Is: failed to verify certificate: 1509: certificate has expired or is not
yet valid: current time is before 2024-04-26 T from server (Internal Error): error when creating
Internal error occurred: failed callin
g webhook \ "webhook.cert-manager. :

What did you expect to happen:
No certificate issue, also the issue is intermittent, not reproduced every time. But once it's reproduced, we are not able to proceed.

What's the InferenceService yaml:
[To help us debug please run kubectl get isvc $name -n $namespace -oyaml and paste the output]

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • certificate-manager: v1.13.0
  • Istio Version: 1.17.0
  • Knative Version: 1.11.0
  • KServe Version:0.11.0
  • Kubeflow version: NA
  • Cloud Environment:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm] : bare metal
  • Minikube/Kind version: Kubespray - Kubernetes v 1.26.12
  • Kubernetes version: (use kubectl version): 1.26.12
  • OS (e.g. from /etc/os-release): rhel 8.8/Ubuntu 22.04
@spolti
Copy link
Contributor

spolti commented Apr 29, 2024

Is it a clean installation?
You could try to delete the webhooks and restart the cert-manager:

 oc get mutatingwebhookconfigurations | grep cert-manager
cert-manager-webhook                                                1          13d

✗ oc get validatingwebhookconfigurations | grep cert-manager
cert-manager-webhook  

Is there any other cert-manager operator running in the cluster?

@Subhankar-Adak
Copy link
Author

It is a clean installation. There is no other cert manager operator. Only the one we deployed as part of kserve deployment.
Currently we reprovisioned the system and issue is not reproduced, once issue is reproduced will provide more detail on it.
On a high level it seems a time sync issue since logs complaining about certificate validity has not started.
Do you have any high-level direction to investigate for the issue?

Also, we mostly observe this issue in rhel and not seen in ubuntu.

@spolti
Copy link
Contributor

spolti commented Apr 30, 2024

Thanks for the feedback.
@Jooho as you've worked with certs in the past, do you have any thoughts to add on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants