-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print "no action" run_cycle message at most every minute #2161
base: master
Are you sure you want to change the base?
Conversation
We can accept such a change only if it is possible to configure the behavior. |
Hrm, in that case, maybe having a config flag |
But boolean can't describe the implemented behavior... |
What else I don't like is changed severity of https://github.com/zalando/patroni/pull/2161/files#diff-b6a3a0ef64783a6d32feb908d74a3a009e9f25008b8ca5313c9c9747e3b74a63R174 Currently it is visible only when something is different from normal Lines 178 to 185 in 3e1076a
|
Yes, I would change the PR to either print the "no action" heartbeat (default) or not (if that new config option is set). If you think having a threshold parameter (and my original implementation) is better, than ok, I will look into it. |
You're right, this change is bogus - I saw it printed on the standby and assumed it would be printed every time, thanks for pointing that out. |
The logic in the log.py knows that messages are normally coming in pairs, |
Can you clarify what you think is best here? |
The thing is that heart-beat logs are super-useful when something goes wrong. When everything is normal we get logs every loop_wait seconds +- a few ms. If you see that the log time gets off by a few seconds this is already an indicator of the problem. Of course, you won't be able to understand what exactly was wrong, but knowing the time when the problem started always helps to investigate further by looking into postgres logs or logs of the DCS. Right now Lines 178 to 185 in 3e1076a
This code "detects" that everything is normal by analyzing messages and if the Lock owner: is followed by the no action. , the first one is "removed".
If you do filtering of the But, we can do better than that. Instead of simply discarding 9 out of 10
In case the log anomality is detected we will first output all accumulated messages from the |
I was getting annoyed at the overly chatty Patroni log if there are now problems and
loop_wait
is at the default 10 seconds.So this prints the "no action.[...]" line only once every minute.