-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dying Pods break NetworkChaos (and perhaps others that inject into the pid) #1446
Comments
Hi @torblerone, currently, this is the expected behavior of the combination of PodChaos of NetworkChaos. If you want to apply different chaos onto the same explosive area, we do not guarantee it's completable now. As you mentioned, behaviors like "injecting Chaos A into some pods, then injecting Chaos B into rest pods" are kinds of specifications about chaos experiment targets. Unfortunately, we do not support this spec now. You could tag different labels to split pods and use labelSelector to divide chaos experiment targets. I am going to close this issue now, if you have other ideas, feel free to reopon it. 😁 |
I prefer to regard it as a |
I haven't figured out the reasonable behavior for this combination of more than one type of chaos. 🤔 Any suggestion? @YangKeao |
I didn't deep dive into the chaos-controller, but doesn't it hold some kind of information of Pods that are affected by certain type of chaos? You could block any new Chaos Experiment to run on those Pods by implementing a mechanism that "blocks" already affected Pods if I'm right. Otherwise the chaos-daemon itself could have a mechanism which blocks affecting the same Pod with 2 types of Chaos? |
I discovered that when we deploy a new version of a service (which happens very often, especially on DEV environment) and NetworkChaos is running concurrently, the NetworkChaos also breaks. You probably can reproduce this by deploying a new image or similar to a Deployment while NetworkChaos is running with target on the current Pods (which have the old version and will be replaced e.g. on rolling upgrades) |
Thanks for your reporting. Yes. It's a known issue for us. We have tried several times (and several pull requests) to fix this problem, but these PRs are all too complicate or expensive and finally none of them got merged. But once the |
I am experiencing the same issue and wondering if there was ever a workaround found for this. |
Pretty incredible how impossibly stuck it gets. I have an HTTPChaos resource in my cluster that failed 4 days ago because some pod restarted due to the error. It's been completely stuck for 4 days, it cannot be deleted, not be started, not paused. It's impossible to work on. The entire namespace is stuck deleting and when forcing it to delete through some workaround on RedHat's website, after creating the namespace again the resource returns. It creates error messages every couple seconds and has amassed a good few thousand of them by now. |
Bug Report
What version of Kubernetes are you using?
What version of Chaos Mesh are you using?
What did you do?
I've created three types of Chaos experiments: Pod Kill, Pod Failure and NetworkChaos.
What did you expect to see?
I expected that the daemons somehow talk to each other and inform themselves about their targets. Or at least the controller has some kind of overview of (accessible) targets.
What did you see instead?
When NetworkChaos is trying to run to a specified target (a random set of pods of a namespace-deployment-combination), but at quite the same time PodKill or PodFailure is running (also a random set of pods of the same namespace-deployment-combination) and kills the Pods that NetworkChaos was trying to target, the NetworkChaos experiment is being broken that much that I have to re-apply the experiment YML. Otherwise, the experiment always tries to run on the same Pods from the last run (which already died from the PodKill).
The dashboard shows the following error on the NetworkChaos experiment:
Output of chaosctl
The text was updated successfully, but these errors were encountered: