Operator behaving differently running in cluster compared to out of cluster #6678

coillteoir · 2024-02-11T23:57:16Z

Type of question

General operator-related help

Question

I am creating an operator to work with a CI/CD system. When I run it locally, it creates pods as expected. But when I deploy it to the cluster, it fails to check if a pod has already been created and will create multiple pods of the same "task".

Pipeline Spec:

Locally using make run:

In Cluster after pushing docker image and using make deploy:

What did you do?

To run individual tasks in a pipeline, I wrote a function which uses DFS to go through a tree data structure and checks the status of child pods before generating a new pod for that
The operator then loops over the generated list of pods and creates them in the cluster.

What did you expect to see?

The correct amount of pods being created.

What did you see instead? Under which circumstances?

Multiple pods being created and the pipeline not being validated.

Environment

Operator type:

/language go

Kubernetes cluster type:

$ operator-sdk version

1.33

$ go version

1.22

$ kubectl version

1.29

Additional context

Current branch for bug: https://github.com/coillteoir/bramble/tree/develop
In the execution group of controllers.
It occurs in both Kind and MiniKube

The text was updated successfully, but these errors were encountered:

coillteoir · 2024-02-12T00:01:55Z

Im unsure of where to start with this issue, in particular if it's a bug in my code or from an upstream library such as controller-runtime.

jberkhahn · 2024-02-12T19:57:30Z

So, reconciliation loops aren't really run in a deterministic manner - multiple controllers might pick up the same event and try to reconcile it, which is why it's always a good idea to check the state of the system before trying to modify it. It looks like you're just always firing off this function that tries to create a bunch of pods.

Not sure why you're experiencing different behavior on/off cluster, though. It might just be due to the increased latency meaning you're getting less controller loops firing or something.

coillteoir · 2024-02-14T11:55:02Z

Just curious, is the controller runtime synchronous or does it use goroutines under the hood? And if it does, would ther be a way to force my reconcile loop to wait for the controller to finish provisioning/getting resources before continuing?

openshift-bot · 2024-05-15T01:01:04Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-ci bot added the language/go Issue is related to a Go operator project label Feb 11, 2024

jberkhahn added the triage/support Indicates an issue that is a support question. label Feb 12, 2024

jberkhahn self-assigned this Feb 12, 2024

jberkhahn added this to the Backlog milestone Feb 12, 2024

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator behaving differently running in cluster compared to out of cluster #6678

Operator behaving differently running in cluster compared to out of cluster #6678

coillteoir commented Feb 11, 2024 •

edited

coillteoir commented Feb 12, 2024

jberkhahn commented Feb 12, 2024

coillteoir commented Feb 14, 2024

openshift-bot commented May 15, 2024

Operator behaving differently running in cluster compared to out of cluster #6678

Operator behaving differently running in cluster compared to out of cluster #6678

Comments

coillteoir commented Feb 11, 2024 • edited

Type of question

Question

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

Additional context

coillteoir commented Feb 12, 2024

jberkhahn commented Feb 12, 2024

coillteoir commented Feb 14, 2024

openshift-bot commented May 15, 2024

coillteoir commented Feb 11, 2024 •

edited