-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: reduce sorted runs during compaction #3702
base: main
Are you sure you want to change the base?
Conversation
2c0f146
to
2dc724e
Compare
f43d397
to
0dd58a5
Compare
0dd58a5
to
b4e29b9
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3702 +/- ##
==========================================
- Coverage 85.64% 85.33% -0.32%
==========================================
Files 953 953
Lines 162835 163544 +709
==========================================
+ Hits 139467 139553 +86
- Misses 23368 23991 +623 |
#[serde_as(as = "DisplayFromStr")] | ||
pub max_active_window_files: usize, | ||
pub max_active_window_runs: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks so. This change will fail the (not existing at present) compatibility test.
How about keeping the old name and only change the document?
#[serde_as(as = "DisplayFromStr")] | ||
pub max_active_window_files: usize, | ||
pub max_active_window_runs: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks so. This change will fail the (not existing at present) compatibility test.
How about keeping the old name and only change the document?
@@ -159,7 +169,7 @@ async fn test_compaction_region_with_overlapping() { | |||
let request = CreateRequestBuilder::new() | |||
.insert_option("compaction.type", "twcs") | |||
.insert_option("compaction.twcs.max_active_window_files", "2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change this as well?
(and in line 271)
let mut current_item = all_items[0].clone(); | ||
|
||
for item in all_items.into_iter().skip(1) { | ||
if current_item.overlap(&item) { | ||
current_item = current_item.merge(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
let mut current_item = all_items[0].clone(); | |
for item in all_items.into_iter().skip(1) { | |
if current_item.overlap(&item) { | |
current_item = current_item.merge(item); | |
let mut res = SortedRun::default(); | |
let mut all_items_iter = all_items.into_iter(); | |
let mut current_item = all_items_iter.next().unwrap(); | |
for item in all_items_iter { | |
if current_item.overlap(&item) { |
reduce one clone
.flat_map(|r| r.items.into_iter()) | ||
.collect::<Vec<_>>(); | ||
|
||
all_items.sort_unstable_by(|l, r| l.start.cmp(&r.start).then(l.end.cmp(&r.end).reverse())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse sort_ranged_items
above? I suppose all MergeItems
are merged here and only contains one Item
for each
res.push_item(std::mem::replace(&mut current_item, item)); | ||
} | ||
} | ||
res.push_item(current_item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be easier to understand
res.push_item(current_item); | |
res.push_item(current_item); | |
current_item = item; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
What's the progress of this PR? @v0y4g3r |
@v0y4g3r What's the status of this PR? It's deprecated or just wait for resolving conflicts. |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR changes the behavior during compaction from reduce file nums to sorted runs.
A sorted run in GreptimeDB is defined as a sequence of non-overlapping SST files. If multiple sorted runs exist in some time window, even a simple query with explicit time ranges may invole multiple files. Before this PR, GreptimeDB merges all files in active window into one single file if the total num of files exceeds
max_active_window_files
, of which the write amplification is rather severe. In this PR, we only enforce amax_active_window_runs
in active window to reduce read amplication. If num of sorted runs exceeds that threshold, it only merges the overlapping files on demand.Checklist