Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry TLS configuration from registries.yaml is only honored for mirror endpoints #9839

Closed
intrand opened this issue Mar 30, 2024 · 7 comments
Assignees
Milestone

Comments

@intrand
Copy link

intrand commented Mar 30, 2024

Environmental Info:
K3s Version:

1.29.3+k3s1

Node(s) CPU architecture, OS, and Version:

Linux pi3 5.15.0-1049-raspi #52-Ubuntu SMP PREEMPT Thu Mar 14 08:39:42 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Cluster Configuration:
1 controlplane node running etcd, 5 worker nodes, all matching raspberry pi computers

Describe the bug:
configuring /etc/rancher/k3s/registries.yaml with the bare minimum for a private registry with a self-signed cert no longer works, but downgrading to 1.29.2+k3s1 allows it to work again without any other changes.

---
configs:
  "registry.domain.tld":
    tls:
      ca_file: /usr/local/share/ca-certificates/ca_from_cluster.pem

Steps To Reproduce:

  1. configure /etc/rancher/k3s/registries.yaml as above
  2. install k3s using latest channel (currently version 1.29.3+k3s1)
  3. deploy some container with its image from that registry
  4. observe a certificate signed by unknown authority error emitted by containerd, captured in kubectl describe pod $pod_name events
  5. downgrade to 1.29.2+k3s1
  6. delete the pod
  7. observe the image pulling without issue

Expected behavior:

to see the image pull correctly as it did in the previous release :)

Actual behavior:

errors related to tls verification and failed pulls

Additional context / logs:

not to lead you down a rabbit hole, but perhaps this is related? #9341

@brandond
Copy link
Contributor

Can you confirm that you are not using a custom containerd config template? Can you provide the output of find /var/lib/rancher/k3s/agent/etc/containerd/ -type f -print -exec cat {} \; along with containerd.log showing the failed pull?

@intrand
Copy link
Author

intrand commented Mar 30, 2024

Can you confirm that you are not using a custom containerd config template? Can you provide the output of find /var/lib/rancher/k3s/agent/etc/containerd/ -type f -print -exec cat {} \; along with containerd.log showing the failed pull?

I have not touched the template at all. I also inspected the containerd toml and compared everything that seemed relevant to a backup from an earlier version and everything was identical.

I do not have the containerd log anymore. Are you unable to reproduce this behavior in 1.29.3+k3s1? 🤔 If absolutely need be I can destroy my cluster and build from scratch, but that should be the last resort.

EDIT: the cluster is up and running on 1.29.2+k3s1 with traffic going to/from. It's disruptive for me to test this on the same metal. I can try on another machine, but so can anyone :) it would be nice to see if anyone else can reproduce this

@brandond
Copy link
Contributor

brandond commented Apr 1, 2024

According to the containerd docs at https://github.com/containerd/containerd/blob/release/1.7/docs/hosts.md, all the host fields are valid at the root level:

For each registry host namespace directory in your registry config_path you may include a hosts.toml configuration file. The following root level toml fields apply to the registry host namespace:

This is what k3s generates:

root@systemd-node-1:/# cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/hosts.toml
# File generated by k3s. DO NOT EDIT.

server = "https://172-17-0-7.sslip.io/v2"
capabilities = ["pull", "resolve", "push"]

ca = ["/usr/local/share/ca-certificates/registry.crt"]

However, containerd fails to load that:

time="2024-04-01T22:11:02.070675417Z" level=error msg="failed to decode hosts.toml" error="invalid `host` tree"

Apparently it goes looking for at least one host section; if it can't find one it fails to use the hosts.toml file entirely, despite the presence of valid config at the root level.

As a workaround, we can generate an empty host section; the following works properly:

root@systemd-node-1:/# cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/hosts.toml
# File generated by k3s. DO NOT EDIT.

server = "https://172-17-0-7.sslip.io/v2"
capabilities = ["pull", "resolve", "push"]

ca = ["/usr/local/share/ca-certificates/registry.crt"]

[host]

I can address this in the next release. In the mean time, if you do not currently specify a port in your registry namespace, you should be able to work around the issue with something like this in your registries.yaml:

mirrors:
 172-17-0-7.sslip.io:
   endpoint:
     - https://172-17-0-7.sslip.io:443
configs:
 "172-17-0-7.sslip.io:443":
   tls:
     ca_file: /usr/local/share/ca-certificates/registry.crt

Note use of a port in the endpoint to force it to generate a host entry in the hosts.toml.

@brandond
Copy link
Contributor

brandond commented Apr 1, 2024

@intrand
Copy link
Author

intrand commented Apr 2, 2024

Thank you very much for going through the work to reproduce this, @brandond!

@brandond
Copy link
Contributor

brandond commented Apr 2, 2024

Using 172-17-0-7.sslip.io as an example registry, the two possible work-arounds are:

  1. If your registry namespace does not currently include a port, configure a mirror endpoint with a port:
    mirrors:
      172-17-0-7.sslip.io:
        endpoint:
          - https://172-17-0-7.sslip.io:443
    configs:
      "172-17-0-7.sslip.io:443":
        tls:
          ca_file: /usr/local/share/ca-certificates/registry.crt
  2. Manually drop the CA certificate into the registry namespace's configuration directory, and make it immutable so that k3s does not remove it when restarting:
    mkdir -p /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/
    cp /usr/local/share/ca-certificates/registry.crt /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/ca.crt
    chattr +i /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/ca.crt

@brandond brandond changed the title private registries regression; self-signed certificate authority no longer verified in latest release Registry TLS configuration from registries.yaml is only honored for mirror endpoints Apr 2, 2024
@brandond brandond pinned this issue Apr 2, 2024
@aganesh-suse aganesh-suse self-assigned this Apr 8, 2024
@aganesh-suse
Copy link

Validated on master branch with version v1.29.4-rc1+k3s1

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server

registries.yaml:

 $ sudo cat /etc/rancher/k3s/registries.yaml
mirrors:
  pvt-registry.com:
    endpoint:
      - pvt-registry.com
  docker.io:
    endpoint:
      - pvt-registry.com      
  k8s.gcr.io:
    endpoint:
      - pvt-registry.com      
configs:
  pvt-registry.com:
    auth:
      username: xxxx
      password: xxxx
    tls:
      ca_file: /home/user/ca.pem

test-image.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: pvt-reg-test
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pvt-reg-test
  namespace: pvt-reg-test
spec:
  selector:
    matchLabels:
      k8s-app: nginx-app-clusterip
  replicas: 2
  template:
    metadata:
      labels:
        k8s-app: nginx-app-clusterip
    spec:
      containers:
      - name: nginx
        image: pvt-registry.com/nginx:latest
        ports:
        - containerPort: 8080

Testing Steps

  1. Copy config.yaml and registries.yaml
$ sudo mkdir -p /etc/rancher/k3s 
$ sudo cp config.yaml /etc/rancher/k3s
$ sudo cp registries.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_VERSION='v1.29.4-rc1+k3s1' sh -s - server
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Push an image onto the private registry and try to deploy a pod with said image.
    The image should get pulled and pod should come up without any tls certificate errors.
$ kubectl apply -f test-image.yaml
$ kubectl get pods -n pvt-reg-test
$ kubectl describe pod/pvt-reg-test-abcd -n pvt-reg-test
  1. Check the hosts.toml files for host section

Replication Results:

  • k3s version used for replication:
$ k3s -v
k3s version v1.29.3+k3s1 (8aecc26b)
go version go1.21.8
$ kubectl get pods -A
kube-system      coredns-6799fbcd5-p7pkw                   1/1     Running            0          4m38s
kube-system      helm-install-traefik-9v8gb                0/1     Completed          1          4m38s
kube-system      helm-install-traefik-crd-5n2cw            0/1     Completed          0          4m38s
kube-system      local-path-provisioner-6c86858495-gps56   1/1     Running            0          4m38s
kube-system      metrics-server-54fd9b65b-mtzk5            1/1     Running            0          4m38s
kube-system      svclb-traefik-44e43501-4kkng              2/2     Running            0          3m26s
kube-system      svclb-traefik-44e43501-hd2qx              2/2     Running            0          4m16s
kube-system      svclb-traefik-44e43501-rx2pt              2/2     Running            0          2m37s
kube-system      svclb-traefik-44e43501-smtfd              2/2     Running            0          4m16s
kube-system      traefik-f4564c4f4-2t2l8                   1/1     Running            0          4m17s
pvt-reg-test     pvt-reg-test-64bc967f8b-6j8jk             0/1     ImagePullBackOff   0          28s
pvt-reg-test     pvt-reg-test-64bc967f8b-sgxg9             0/1     ErrImagePull       0          28s

Pod Events:

Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  7m39s                   default-scheduler  Successfully assigned pvt-reg-test/pvt-reg-test-64bc967f8b-sgxg9 to ip-172-31-16-132
  Normal   Pulling    6m6s (x4 over 7m38s)    kubelet            Pulling image "pvt-registry.com/nginx:latest"
  Warning  Failed     6m6s (x4 over 7m38s)    kubelet            Failed to pull image "pvt-registry.com/nginx:latest": failed to pull and unpack image "pvt-registry.com/nginx:latest": failed to resolve reference "pvt-registry.com/nginx:latest": failed to do request: Head "https://pvt-registry.com/v2/nginx/manifests/latest": tls: failed to verify certificate: x509: certificate signed by unknown authority
  Warning  Failed     6m6s (x4 over 7m38s)    kubelet            Error: ErrImagePull
  Warning  Failed     5m54s (x6 over 7m38s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff    2m27s (x21 over 7m38s)  kubelet            Back-off pulling image "pvt-registry.com/nginx:latest"

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.29.4-rc1+k3s1 (d973fadb)
go version go1.21.9
$ kubectl get pods -A
NAMESPACE        NAME                                      READY   STATUS            RESTARTS   AGE
kube-system      coredns-6799fbcd5-ccwrw                   1/1     Running           0          4m42s
kube-system      helm-install-traefik-667w4                0/1     Completed         1          4m43s
kube-system      helm-install-traefik-crd-2nq47            0/1     Completed         0          4m43s
kube-system      local-path-provisioner-6c86858495-dvwzt   1/1     Running           0          4m42s
kube-system      metrics-server-54fd9b65b-nkzds            1/1     Running           0          4m42s
kube-system      svclb-traefik-045f5f22-9cdff              2/2     Running           0          4m27s
kube-system      svclb-traefik-045f5f22-dnvkt              2/2     Running           0          4m27s
kube-system      svclb-traefik-045f5f22-jwx2j              2/2     Running           0          3m27s
kube-system      svclb-traefik-045f5f22-rmx7m              2/2     Running           0          2m37s
kube-system      traefik-7d5f6474df-26pw8                  1/1     Running           0          4m27s
pvt-reg-test     pvt-reg-test-66cb57586c-7ckvp             1/1     Running           0          28s
pvt-reg-test     pvt-reg-test-66cb57586c-f88jb             1/1     Running           0          28s

Check the hosts.toml for host section:

 $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/pvt-registry.com/hosts.toml 
# File generated by k3s. DO NOT EDIT.

server = "https://pvt-registry.com/v2"
capabilities = ["pull", "resolve", "push"]

ca = ["/home/ubuntu/ca.pem"]


[host]
 $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/docker.io/hosts.toml 
# File generated by k3s. DO NOT EDIT.

server = "https://registry-1.docker.io/v2"
capabilities = ["pull", "resolve", "push"]


[host]
[host."https://pvt-registry.com/v2"]
  capabilities = ["pull", "resolve"]
  ca = ["/home/ubuntu/ca.pem"]

 $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/k8s.gcr.io/hosts.toml 
# File generated by k3s. DO NOT EDIT.

server = "https://k8s.gcr.io/v2"
capabilities = ["pull", "resolve", "push"]


[host]
[host."https://pvt-registry.com/v2"]
  capabilities = ["pull", "resolve"]
  ca = ["/home/ubuntu/ca.pem"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

3 participants