Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm_stm: couple of stability fixes noticed when down scaling max_concurrent_producer_ids #18573

Merged

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented May 18, 2024

On a cluster with a large pile up of historical producer_ids, a sudden down scale of the configuration resulted in crashes.

  • eviction logic spins up too many concurrent tasks in the same scheduling point (in this case its > 1M) and that is resulting in a 16MB allocation in the reactor path.
  • Loading too many snapshots at once (during bootstrap) was causing a big spike in memory usage and we can potentially save some memory by aggressively releasing any unused memory from the snapshot structures that are already loaded in the desired destination..

TBD: Link GH issues.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

@bharathv
Copy link
Contributor Author

/dt

@travisdowns
Copy link
Member

Seems good. Isn't part of the problem that evict loop here is not async so the tasks just accumulate without getting any chance to run?

@travisdowns
Copy link
Member

How often does the eviction tick? With only 100 per tick is it possible that the PIDs can now grow faster than we evict them?

Instead of spawning on task per PID it seems better to just collect all the PIDs we are going to evict and spawn off a single task to do the eviction. This would also solve the specific allocation failure.

@bharathv
Copy link
Contributor Author

Isn't part of the problem that evict loop here is not async so the tasks just accumulate without getting any chance to run?

Ya right, the original intention was to not have a scheduling point during the eviction tick so the iterators in the list it is traversing are not invalidated. I have another idea that got rid of the async evict function, that seems to be working locally, I'll clean it up and push it shortly.

How often does the eviction tick? With only 100 per tick is it possible that the PIDs can now grow faster than we evict them?
Instead of spawning on task per PID it seems better to just collect all the PIDs we are going to evict and spawn off a single task to do the eviction. This would also solve the specific allocation failure.

It runs every 5s, ya probably 100 per tick is too small.. I think with a non futurized evict implementation (next PS), we can evict more in one go.

When a lot of partitions startup on the shard at the same time, we
noticed crashes in this part of the code when the snapshot sizes are non
trivial (large # of producers in the snapshot). This patch releases
already applied snapshot state to ease the memory pressure a bit.
@bharathv
Copy link
Contributor Author

/ci-repeat 5

@bharathv
Copy link
Contributor Author

/dt

@bharathv bharathv added this to the v23.3.16 milestone May 21, 2024
@bharathv
Copy link
Contributor Author

/ci-repeat 5

@bharathv bharathv marked this pull request as ready for review May 21, 2024 22:34
@piyushredpanda
Copy link
Contributor

piyushredpanda commented May 22, 2024

Known failure: #12897

This test ensures concurrent evictions can happen in the presence of
replication operations and operations that reset the state (snapshots,
partition stop.
Prior to this commit producer_state::evict() is asynchronous because it
waited on a gate to drain out any pending tasks on the producer state.
This resulted in a fiber per evict() call in the producer_state_manager
resulting in large number of fibers when evicting a ton of producers in
one eviction tick (which manifested as a large allocation).

This commit fixes the issue by making evict() synchronous and getting
rid of the gate that was forcing it. That is ok because the eviction
tick runs synchronously (without scheduling points) and only considers
candidates that are still linked to the shard wide list. Any caller
function using the producer state either waits on op_lock or detaches
from the list before running the operation ensuring that it is not
considered for eviction.
@travisdowns
Copy link
Member

Were you able to reproduce the crash with the new tests you added?

@piyushredpanda piyushredpanda modified the milestones: v23.3.16, v23.3.x-next May 22, 2024
@bharathv
Copy link
Contributor Author

Were you able to reproduce the crash with the new tests you added?

The new tests in this PR mainly validates the correctness and memory safety of eviction and not the OOM issue we ran into. I'm not able to reproduce the exact OOM that the user ran into with spawning too many tasks at once, perhaps it needs a lot of background noise in the test to ensure fragmentation. I'm pushing a small test that ensures only limited number of producers are evicted in each tick and that combined with the fact that there are no task per eviction probably addresses the issue.

@piyushredpanda piyushredpanda merged commit 165b952 into redpanda-data:v23.3.x May 23, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants