Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number_of_prefetched_tasks not decrementing #1364

Open
dpdoughe opened this issue Mar 28, 2024 · 4 comments
Open

number_of_prefetched_tasks not decrementing #1364

dpdoughe opened this issue Mar 28, 2024 · 4 comments
Labels

Comments

@dpdoughe
Copy link

dpdoughe commented Mar 28, 2024

Describe the bug
The number_of_prefetched_tasks remains greater than zero and grows slowly after service is under load.

To Reproduce
I've seen the number_of_prefetched_tasks slowly creep up (by 1 or 2) counts after periods of heavy load (hundreds of thousands of tasks). I don't have a way in hand to reproduce this as I don't know yet what the root cause might be.

What I think may be going on is that the code here

if event_type == 'task-received' and not task.eta and task_received:

will increment the counter if 3 conditions are met.

While the code here

if event_type == 'task-started' and not task.eta and task_started and task_received:

is required to decrement the counter.

I wonder if either i) some condition might fail to match the 4 decrementing criteria, or if there is some scenario where the corresponding event would never get fired at all.

Expected behavior
number_of_prefetched_tasks should decrease as well as increase and potentially get back to zero under periods of low to no load.

Screenshots
GrafanaPrefetchedTasksCounter

System information
Output of python -c 'from flower.utils import bugreport; print(bugreport())' command

flower -> flower:2.0.1 tornado:6.4 humanize:4.9.0
software -> celery:5.3.6 (emerald-rush) kombu:5.3.4 py:3.8.10
billiard:4.2.0 py-amqp:5.2.0
platform -> system:Linux arch:64bit
kernel version:5.15.0-1053-azure imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:disabled

deprecated_settings: None

@dpdoughe dpdoughe added the bug label Mar 28, 2024
@dpdoughe
Copy link
Author

dpdoughe commented Mar 28, 2024

This seems to be some rare scenario in which the number of decrementations doesn't equal the number of incrementations. This leads to a gradual net creeping up of the number_of_prefetched_tasks during the up-time of the server

@dpdoughe
Copy link
Author

dpdoughe commented Apr 1, 2024

My current best guess about what is going on here is that this upwards drift can happen when an expires is set on a task. According to the docs ,

When a worker receives an expired task it will mark the task as REVOKED

Since the decrementation logic

if event_type == 'task-started' and not task.eta and task_started and task_received:

apparently makes no allowance for revoked tasks, the counter drifts upwards in proportion to the number of revoked tasks.

See https://docs.celeryq.dev/en/stable/reference/celery.events.state.html#celery.events.state.State.Task.revoked

@dpdoughe
Copy link
Author

dpdoughe commented Apr 3, 2024

So to conclusively test the theory, I created a task that had expires=1 and the task internally slept for 100 seconds.
Sending a bunch of async requests like this over a short period of time does result in the buggy behavior:
grafana_prefetched_no_decrement

We can see the difference between a task that was prefetched and then revoked
flower_revoked_prefetched

compared to a task that was revoked before ever becoming prefetched
flower_revoked_not_prefetched

So the new code in my pull request looks for the attribute values needed to distinguish and then decrement those that only are revoked after becoming prefetched.

@dpdoughe
Copy link
Author

dpdoughe commented Apr 3, 2024

@mher Please take a look at this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant