Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Istio upgrade from 1.18.5 to 1.21.1 fails #50660

Open
2 tasks done
ajaykumarmandapati opened this issue Apr 24, 2024 · 5 comments
Open
2 tasks done

Istio upgrade from 1.18.5 to 1.21.1 fails #50660

ajaykumarmandapati opened this issue Apr 24, 2024 · 5 comments
Labels
area/upgrade Issues related to upgrades

Comments

@ajaykumarmandapati
Copy link

ajaykumarmandapati commented Apr 24, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

We have been long waiting for an upgrade from 1.18.5 which is EOL to a version that is compatible with bottle-rocket nodes,
However, an upgrade to 1.21.1 now fails with the below log message from the istio-validation container.

Note: we have to upgrade to 1.21.1 since we use bottle-rocket nodes in AWS EKS environment - #50198

2024-04-24T07:45:56.651101Z	info	Starting iptables validation. This check verifies that iptables rules are properly established for the network.
2024-04-24T07:45:56.651169Z	info	Listening on [::1]:15001
2024-04-24T07:45:56.651435Z	info	Listening on [::1]:15006
2024-04-24T07:46:01.651924Z	error iptables validation failed; workload is not ready for Istio.
When using Istio CNI, this can occur if a pod is scheduled before the node is ready.

If installed with 'cni.repair.deletePods=true', this pod should automatically be deleted and retry.
Otherwise, this pod will need to be manually removed so that it is scheduled on a node with istio-cni running, allowing iptables rules to be established.
2024-04-24T07:46:01.652186273Z

Below are the logs from the DS istio-cni-node

# Completed on Wed Apr 24 07:42:42 2024
2024-04-24T07:42:42.401663Z	info	cni	============= End iptables configuration for echoserver-55ff5756b8-mtzwx =============
2024-04-24T07:42:52.511775Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.512351Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, retrying (retry count: 1): get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods
2024-04-24T07:42:52.517605Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.517993Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, retrying (retry count: 2): get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods
2024-04-24T07:42:52.528325Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.528826Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, retrying (retry count: 3): get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods
2024-04-24T07:42:52.549094Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.549508Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, retrying (retry count: 4): get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods
2024-04-24T07:42:52.589751Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.590244Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, and retry budget exceeded: get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods

Version

istioctl version                                                                                                                                       
client version: 1.21.1
control plane version: 1.21.1
data plane version: 1.18.5 (3 proxies), 1.21.1 (1 proxies)

kubectl version                                                                                                                                        
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3-eks-adc7111

helm version --short                                                                                                                                   
v3.11.1+g293b50c

Additional Information

bug-report.tar.gz

@istio-policy-bot istio-policy-bot added the area/upgrade Issues related to upgrades label Apr 24, 2024
@subpathdev
Copy link

We run into the same issue too. Also running eks with bottlerocket as ami-type.
Our upgrade path was a little bit different as we have upgraded from 1.20.5 (no issues) to 1.21.1 with this issue.

One additional information is:

  • by deleting the pods manually via kubectl delete we are able to start the failing pod successfully.
  • the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

@ajaykumarmandapati
Copy link
Author

We run into the same issue too. Also running eks with bottlerocket as ami-type. Our upgrade path was a little bit different as we have upgraded from 1.20.5 (no issues) to 1.21.1 with this issue.

One additional information is:

  • by deleting the pods manually via kubectl delete we are able to start the failing pod successfully.
  • the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

I am unable to run 1.20.5 , I do get the same error error iptables validation failed; workload is not ready for Istio. When using Istio CNI, this can occur if a pod is scheduled before the node is ready. Did you'l have to change any helm values to run 1.20.5 ?

@bleggett
Copy link
Contributor

bleggett commented Apr 29, 2024

* the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

This specifically is likely a problem with the CNI not actually being installed + reaching ready state on the node before pods are scheduled on the node.

#40303

The way to manage this sort of thing in K8S is with node taints, to prevent the runtime from starting pods before the node is ready.

#48818 should help with this.

@subpathdev
Copy link

We run into the same issue too. Also running eks with bottlerocket as ami-type. Our upgrade path was a little bit different as we have upgraded from 1.20.5 (no issues) to 1.21.1 with this issue.
One additional information is:

  • by deleting the pods manually via kubectl delete we are able to start the failing pod successfully.
  • the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

I am unable to run 1.20.5 , I do get the same error error iptables validation failed; workload is not ready for Istio. When using Istio CNI, this can occur if a pod is scheduled before the node is ready. Did you'l have to change any helm values to run 1.20.5 ?

We do not have modified our settings by moving from 1.19.5 to 1.20.5. But pods in error state will be removed by us in previously to 1.21.1 with the configuration flag values.cni.repair. deletePods perhaps you have disabled this?

@subpathdev
Copy link

* the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

This specifically is likely a problem with the CNI not actually being installed + reaching ready state on the node before pods are scheduled on the node.

#40303

The way to manage this sort of thing in K8S is with node taints, to prevent the runtime from starting pods before the node is ready.

#48818 should help with this.

As far as I understand the modification in MR #48818 it could resolve our issue. It would be nice to have some documentation anywhere about this feature / setting as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/upgrade Issues related to upgrades
Projects
None yet
Development

No branches or pull requests

4 participants