Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flakiness in Jenkins tiles #398

Open
cachedout opened this issue Jul 13, 2021 · 3 comments
Open

Flakiness in Jenkins tiles #398

cachedout opened this issue Jul 13, 2021 · 3 comments

Comments

@cachedout
Copy link

Hello!

First, nice work on this project. It's extremely well-designed and very easy to work with.

I'm experiencing a bit of a strange problem. I have a number of tiles which are organized into groups where each group contains between 5-20 Jenkins jobs. Monitoror starts up fine, but after a minute or two, I start seeing jobs flap between failure and success, despite there being no change in the job itself on the Jenkins side. (I have verified this repeatedly just to be sure.)

I have experimented with the core cache values but to no avail. I'm still seeing certain jobs flap between success and failure.

I'm certainly willing to believe that Monitoror is doing the right thing here and Jenkins is failing to send the correct API response, but my question is -- how can I tell?

Is there debug logging in Monitoror which can be enabled to watch responses as they are returned? If not, would you consider adding a flag to enable it?

My second question is about the caching options. Do they control the randomization splay for requests or just the rate at which those requests are made upstream? If it's the latter, is there any way to increase the amount of spay in between upstream requests?

Thanks very much in advance.

@cachedout
Copy link
Author

I should point out that I believe I'm seeing multiple tiles update at once, which leads me to believe that perhaps there may be some places where additional splay and randomization may need to be added. I'll keep an eye on this and see if I can confirm this behavior.

@cachedout
Copy link
Author

From watching the requests from the browser to the app, here's what comes back when a job suddenly becomes flakey. (Sensitive information snipped out):

{"type":"JENKINS-BUILD","status":"FAILURE","label":"<snipped>","message":"unable to find job","build":{"branch":"master"}}

@cachedout
Copy link
Author

cachedout commented Jul 13, 2021

We've (possibly) tracked this down to nginx in front of the Jenkins instance rate-limiting Monitoror. However, I still believe this may be a bug as I believe we shouldn't be seeing Monitoror send bursts of requests quite so aggressively. :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant