Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replay all regions in a batched manner #3808

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

niebayes
Copy link
Contributor

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

When using Kafka WAL, the read amplification of region replay can become exceedingly high as multiple regions are assigned to a topic, requiring numerous WAL entries to be pulled repeatedly. This PR aims to remove the read amplification by performing region replay in a batched manner.

Specifically, each RegionOpenRequest now includes an optional WalReader field. This field serves as a mutable reference to the WalReader instance. If the WalReader is Some, region replay is bypassed, and essential information for replaying the region is stored within the WalReader.

Once the region server completes handling all RegionOpenRequests, the WalReader contains the necessary information for replaying all regions. This information enables replaying all regions in a batched manner without introducing any read amplification.

On the other hand, to support replaying regions by topic rather than by region, we have added the group_by_namespace interface to the LogStore. This interface will perform a group by operation on the input namespaces according to certain rules. For Kafka WAL, this interface will group namespaces by topic.

Once we obtain the streamlined namespaces, we then fetch logs from multiple namespaces, aka. topics, in parallel. We parse data from each log, then insert it into a RegionPutRequest, which is then processed by the region server. Considering that replayed data should not be written to the WAL again, we have added an Option to the RegionPutRequest. If it is Some, it indicates that this is replayed data. If it is None, it indicates that this is newly written data.

This PR is still in its draft version. I will add necessary tests and refine the codes.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

@niebayes niebayes self-assigned this Apr 26, 2024
@niebayes niebayes requested review from waynexia, v0y4g3r, evenyag and a team as code owners April 26, 2024 09:05
@niebayes niebayes marked this pull request as draft April 26, 2024 09:05
@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant