Implement Buffer memory tracking and limits #24956

pauldix · 2024-05-03T14:37:29Z

Currently, the write buffer buffers all data in memory while a segment is open and persists it after the segment has pass 1/2 of its duration (e.g. a 1 hour buffer will be persisted 30 minutes after the hour is passed). If the server is configured with larger segments (1, 2 or 4 hours), this could lead to eventual OOMs for higher write workload cases.

The server should have a configuration parameter that sets how much memory the buffer is allowed to use. Note that the server itself will use more memory than this since it must also do query processing and other things. This setting is only for the buffer.

The flusher should keep track of the growth rate of memory for the entire buffer. Each time it flushes writes to the WAL it should check if the rate of growth will cause it to pass the threshold in more than 300 seconds. If so, it should kick off a background task to persist data to free up memory based on the following criteria:

Pick the largest segment that hasn't received writes for the lessor of 1/10th the duration or 300 seconds and persist the entire segment (background this and free up the wal op flusher)
If no segment matched the previous criteria:
1. Pick the oldest segment based on open time
2. Pick the largest table buffer based on size
3. Split the table buffer into 90% of rows and 10% of rows, with 90% being the oldest times (from time column)
4. Update the table buffer size so that it doesn't have the 90% anymore
5. Write record into the WAL that we're persisting the 90% (now we background the task and free up the wal op flusher)
6. Persist and then write record into WAL that the persist completed
7. Update the segment size to remove the persisted 90%
Limit the number of simultaneous jobs to core count (just need some number here)

When the WAL replay happens, we should not perform the actual persist, but keep the 90% set aside and continue replaying the WAL until we see the persist message. If we finish wal replay and never get the persist, then we should perform it and mark it in the WAL. This should all happen before we start accepting new writes and queries during startup of the server.

If we have crossed the threshold, the wal op flusher thread should wait and keep checking the persist tasks to see when they are done and have freed up more memory. It should kick off additional tasks based on the above criteria and loop until space is freed up. During this time the server will start returning 500 responses.

Tasks

Give feedback

Add last write instant to buffer segment
Wire up size tracking for table buffer and buffer segment, return total buffer size on write
Implement buffer size tracker to predict when we need to persist and trigger it
Implement table buffer split and persist
Implement flusher cooldown loop that stops writes while waiting for memory to get freed
Implement WAL replay logic
Options

pauldix added v3 epic/perf-prototyping labels May 3, 2024

pauldix self-assigned this May 3, 2024

pauldix mentioned this issue May 20, 2024

feat: Add last_write_time and table buffer size #25017

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Buffer memory tracking and limits #24956

Implement Buffer memory tracking and limits #24956

pauldix commented May 3, 2024 •

edited

Tasks

Implement Buffer memory tracking and limits #24956

Implement Buffer memory tracking and limits #24956

Comments

pauldix commented May 3, 2024 • edited

Tasks

pauldix commented May 3, 2024 •

edited