Istio upgrade from 1.18.5 to 1.21.1 fails #50660

ajaykumarmandapati · 2024-04-24T07:53:18Z

Is this the right place to submit this?

This is not a security vulnerability or a crashing bug
This is not a question about how to use Istio

Bug Description

We have been long waiting for an upgrade from 1.18.5 which is EOL to a version that is compatible with bottle-rocket nodes,
However, an upgrade to 1.21.1 now fails with the below log message from the istio-validation container.

Note: we have to upgrade to 1.21.1 since we use bottle-rocket nodes in AWS EKS environment - #50198

2024-04-24T07:45:56.651101Z	info	Starting iptables validation. This check verifies that iptables rules are properly established for the network.
2024-04-24T07:45:56.651169Z	info	Listening on [::1]:15001
2024-04-24T07:45:56.651435Z	info	Listening on [::1]:15006
2024-04-24T07:46:01.651924Z	error iptables validation failed; workload is not ready for Istio.
When using Istio CNI, this can occur if a pod is scheduled before the node is ready.

If installed with 'cni.repair.deletePods=true', this pod should automatically be deleted and retry.
Otherwise, this pod will need to be manually removed so that it is scheduled on a node with istio-cni running, allowing iptables rules to be established.
2024-04-24T07:46:01.652186273Z

Below are the logs from the DS istio-cni-node

# Completed on Wed Apr 24 07:42:42 2024
2024-04-24T07:42:42.401663Z	info	cni	============= End iptables configuration for echoserver-55ff5756b8-mtzwx =============
2024-04-24T07:42:52.511775Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.512351Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, retrying (retry count: 1): get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods
2024-04-24T07:42:52.517605Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.517993Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, retrying (retry count: 2): get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods
2024-04-24T07:42:52.528325Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.528826Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, retrying (retry count: 3): get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods
2024-04-24T07:42:52.549094Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.549508Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, retrying (retry count: 4): get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods
2024-04-24T07:42:52.589751Z	info	repair	Repairing pod...	pod=default/echoserver-55ff5756b8-mtzwx
2024-04-24T07:42:52.590244Z	error	controllers	error handling default/echoserver-55ff5756b8-mtzwx, and retry budget exceeded: get netns: in host network: network id: find link for 2a05:d014:1d16:3d05:7696::5: no routes found for 2a05:d014:1d16:3d05:7696::5	controller=repair pods

Version

istioctl version                                                                                                                                       
client version: 1.21.1
control plane version: 1.21.1
data plane version: 1.18.5 (3 proxies), 1.21.1 (1 proxies)

kubectl version                                                                                                                                        
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3-eks-adc7111

helm version --short                                                                                                                                   
v3.11.1+g293b50c

Additional Information

bug-report.tar.gz

The text was updated successfully, but these errors were encountered:

subpathdev · 2024-04-25T10:22:37Z

We run into the same issue too. Also running eks with bottlerocket as ami-type.
Our upgrade path was a little bit different as we have upgraded from 1.20.5 (no issues) to 1.21.1 with this issue.

One additional information is:

by deleting the pods manually via kubectl delete we are able to start the failing pod successfully.
the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

ajaykumarmandapati · 2024-04-29T07:40:20Z

We run into the same issue too. Also running eks with bottlerocket as ami-type. Our upgrade path was a little bit different as we have upgraded from 1.20.5 (no issues) to 1.21.1 with this issue.

One additional information is:

by deleting the pods manually via kubectl delete we are able to start the failing pod successfully.

the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

I am unable to run 1.20.5 , I do get the same error error iptables validation failed; workload is not ready for Istio. When using Istio CNI, this can occur if a pod is scheduled before the node is ready. Did you'l have to change any helm values to run 1.20.5 ?

bleggett · 2024-04-29T15:23:40Z

* the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

This specifically is likely a problem with the CNI not actually being installed + reaching ready state on the node before pods are scheduled on the node.

#40303

The way to manage this sort of thing in K8S is with node taints, to prevent the runtime from starting pods before the node is ready.

#48818 should help with this.

subpathdev · 2024-04-30T05:39:18Z

We run into the same issue too. Also running eks with bottlerocket as ami-type. Our upgrade path was a little bit different as we have upgraded from 1.20.5 (no issues) to 1.21.1 with this issue.
One additional information is:

by deleting the pods manually via kubectl delete we are able to start the failing pod successfully.

the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).

I am unable to run 1.20.5 , I do get the same error error iptables validation failed; workload is not ready for Istio. When using Istio CNI, this can occur if a pod is scheduled before the node is ready. Did you'l have to change any helm values to run 1.20.5 ?

We do not have modified our settings by moving from 1.19.5 to 1.20.5. But pods in error state will be removed by us in previously to 1.21.1 with the configuration flag values.cni.repair. deletePods perhaps you have disabled this?

subpathdev · 2024-04-30T05:57:50Z

* the issue occurs only on new nodes and when the pod is scheduled shortly after the node creation. (In our example the time period is 29s).
This specifically is likely a problem with the CNI not actually being installed + reaching ready state on the node before pods are scheduled on the node.

#40303

The way to manage this sort of thing in K8S is with node taints, to prevent the runtime from starting pods before the node is ready.

#48818 should help with this.

As far as I understand the modification in MR #48818 it could resolve our issue. It would be nice to have some documentation anywhere about this feature / setting as well.

istio-policy-bot added the area/upgrade Issues related to upgrades label Apr 24, 2024

ajaykumarmandapati mentioned this issue Apr 29, 2024

Pod init:Error in EKS IPv6 with istio cni #48677

Closed

2 tasks

bleggett mentioned this issue Apr 30, 2024

Untaint controller needs documentation istio/istio.io#15003

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Istio upgrade from 1.18.5 to 1.21.1 fails #50660

Istio upgrade from 1.18.5 to 1.21.1 fails #50660

ajaykumarmandapati commented Apr 24, 2024 •

edited

subpathdev commented Apr 25, 2024

ajaykumarmandapati commented Apr 29, 2024

bleggett commented Apr 29, 2024 •

edited

subpathdev commented Apr 30, 2024

subpathdev commented Apr 30, 2024

Istio upgrade from 1.18.5 to 1.21.1 fails #50660

Istio upgrade from 1.18.5 to 1.21.1 fails #50660

Comments

ajaykumarmandapati commented Apr 24, 2024 • edited

Is this the right place to submit this?

Bug Description

Version

Additional Information

subpathdev commented Apr 25, 2024

ajaykumarmandapati commented Apr 29, 2024

bleggett commented Apr 29, 2024 • edited

subpathdev commented Apr 30, 2024

subpathdev commented Apr 30, 2024

ajaykumarmandapati commented Apr 24, 2024 •

edited

bleggett commented Apr 29, 2024 •

edited