Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Buffer memory tracking and limits #24956

Open
2 of 6 tasks
pauldix opened this issue May 3, 2024 · 0 comments
Open
2 of 6 tasks

Implement Buffer memory tracking and limits #24956

pauldix opened this issue May 3, 2024 · 0 comments

Comments

@pauldix
Copy link
Member

pauldix commented May 3, 2024

Currently, the write buffer buffers all data in memory while a segment is open and persists it after the segment has pass 1/2 of its duration (e.g. a 1 hour buffer will be persisted 30 minutes after the hour is passed). If the server is configured with larger segments (1, 2 or 4 hours), this could lead to eventual OOMs for higher write workload cases.

The server should have a configuration parameter that sets how much memory the buffer is allowed to use. Note that the server itself will use more memory than this since it must also do query processing and other things. This setting is only for the buffer.

The flusher should keep track of the growth rate of memory for the entire buffer. Each time it flushes writes to the WAL it should check if the rate of growth will cause it to pass the threshold in more than 300 seconds. If so, it should kick off a background task to persist data to free up memory based on the following criteria:

  1. Pick the largest segment that hasn't received writes for the lessor of 1/10th the duration or 300 seconds and persist the entire segment (background this and free up the wal op flusher)
  2. If no segment matched the previous criteria:
    1. Pick the oldest segment based on open time
    2. Pick the largest table buffer based on size
    3. Split the table buffer into 90% of rows and 10% of rows, with 90% being the oldest times (from time column)
    4. Update the table buffer size so that it doesn't have the 90% anymore
    5. Write record into the WAL that we're persisting the 90% (now we background the task and free up the wal op flusher)
    6. Persist and then write record into WAL that the persist completed
    7. Update the segment size to remove the persisted 90%
  3. Limit the number of simultaneous jobs to core count (just need some number here)

When the WAL replay happens, we should not perform the actual persist, but keep the 90% set aside and continue replaying the WAL until we see the persist message. If we finish wal replay and never get the persist, then we should perform it and mark it in the WAL. This should all happen before we start accepting new writes and queries during startup of the server.

If we have crossed the threshold, the wal op flusher thread should wait and keep checking the persist tasks to see when they are done and have freed up more memory. It should kick off additional tasks based on the above criteria and loop until space is freed up. During this time the server will start returning 500 responses.

Tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant