New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the cluster updates are not flowing to all the gateway pods #50670
Comments
Were istiod-5756dc65b7-pl9ns and istiod-5756dc65b7-sw8sc (not sent) different in any way from istiod-5756dc65b7-42jmx (synced) ? Can you reproduce this if everyone is running 1.19.10 ? I see you have 1.17.3 (12 proxies) - that's more than 2 minor version behind |
Nothing stood out to me apart from some duplicate serviceentry which was causing errors on all of the istiod pods
we do not have control on the proxy pods, they are brought to the new version when their deployment is updated. Not sure if i can reproduce this anyway |
Do you have full envoy proxy logs for the proxies with the issue |
Attached, [ removed the actual traffic logs and the messages of |
Oh so they are rejected:
I didn't know about that constraint, and we don't validate it or document it -- but clearly we should |
The above problem was fixed it was on one of the destination rule, whose proxy was stuck ( envoy was not coming up ). |
Do you mean you fixed the maglev error and still see the original issue of clusters not sent? or both are fixed |
Yes, the maglev issue is fixed now, but the cluster updates are still not happening unless the proxies are restarted |
Ah got it. Is this happening repeatedly? or you have a few stuck pods and haven't restarted all of them? if repeatedly - can you send a new log now that the maglev issue is fixed to ensure that wasn't somehow messing with things? |
i still have some pods which i have not restarted and are broken |
@howardjohn any thing else we can check here ? |
@bseenu its a bit hard to tell because it could easily just be because of the maglev issue. It would help if you could reproduce it now that that error isn't present, and include the istiod logs |
Adding a bit more information about this issue and steps to reproduce the issue: I looked at two proxies from the same ingress, where one pod was not getting CDS updates sent to it (replica Info from /debug/syncz endpoint about two proxies
Looking at the envoy config dump from the "stuck" replica, We can see, that it still has the incorrect maglev table size
|
Steps to reproduce this issue.
Relevant snapshot of /debug/sync from istiod
|
Is this the right place to submit this?
Bug Description
Version
Additional Information
I have fixed this by restarting ( deleting ) the gateway pods which do not have the updated cluster info, Looking at the proxy logs the last CDS update happened like 20 days back, why does this happen ? how it can be handled ?
The text was updated successfully, but these errors were encountered: