You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using a private AKS cluster version 1.26.x, We have configured linkerd stable version 2.14.2 with linkerd-cni enabled.
The AKS cluster is enabled with OIDC which is designed to to auto rotate the signing keys periodically.
After the OIDC keys were auto rotated, all the new pods were getting stuck with following error
“FailedCreatePodSandBox (x556 over ) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3756782430d4016076288c700b871e4325ca2d5d6bdd7a422697c7d3b54d23e6": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Unauthorized”
We found that the issue started after an automatic RotateServiceAccountSigningKeys operation
We tried reconciling the cluster, by running a “az aks update” command, but the issue persisted.
we tried to create a new token for the default service account in the default namespace, then created a new pod with it. but the issue persisted.
Then, we tried running “az aks oidc-issuer rotate-signing-keys” twice, but the issue persisted.
Lastly, we figured that since the new pods are failing with an unauthorized linkerd error, that would mean that the issue is being generated in the linkerd pods. Therefore, we deleted the linkerd-cni daemonset pods, which caused the new pods to get the fresh token, which caused the issue to get resolved.
After restarted the linkerd-cni daemonset were were able to deploy the new pods but the existing pods in the linkerd meshed namespace started giving invalid certificate errors and pods inter communication was impacted.
We checked the issuer certificate and it was valid. We had to redeploy linkerd to get rid of this issue
Need to you help in troubleshooting linkerd issues with OIDC
How can it be reproduced?
we need to manual auto rotated the oidc signing keys in new infra to reproduce this issues.__
What is the issue?
We are using a private AKS cluster version 1.26.x, We have configured linkerd stable version 2.14.2 with linkerd-cni enabled.
The AKS cluster is enabled with OIDC which is designed to to auto rotate the signing keys periodically.
After the OIDC keys were auto rotated, all the new pods were getting stuck with following error
“FailedCreatePodSandBox (x556 over ) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3756782430d4016076288c700b871e4325ca2d5d6bdd7a422697c7d3b54d23e6": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Unauthorized”
After restarted the linkerd-cni daemonset were were able to deploy the new pods but the existing pods in the linkerd meshed namespace started giving invalid certificate errors and pods inter communication was impacted.
We checked the issuer certificate and it was valid. We had to redeploy linkerd to get rid of this issue
Need to you help in troubleshooting linkerd issues with OIDC
How can it be reproduced?
we need to manual auto rotated the oidc signing keys in new infra to reproduce this issues.__
Logs, error output, etc
Linkerd control plane
[ 0.105506s] WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 0.306969s] WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 0.710647s] WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 1.211775s] WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 1.713047s] WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 2.215585s] WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 2.716391s] WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 3.217705s] WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
output of
linkerd check -o short
N/A
Environment
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
yes
The text was updated successfully, but these errors were encountered: