Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: reduce sorted runs during compaction #3702

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

v0y4g3r
Copy link
Contributor

@v0y4g3r v0y4g3r commented Apr 14, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR changes the behavior during compaction from reduce file nums to sorted runs.

A sorted run in GreptimeDB is defined as a sequence of non-overlapping SST files. If multiple sorted runs exist in some time window, even a simple query with explicit time ranges may invole multiple files. Before this PR, GreptimeDB merges all files in active window into one single file if the total num of files exceeds max_active_window_files, of which the write amplification is rather severe. In this PR, we only enforce a max_active_window_runs in active window to reduce read amplication. If num of sorted runs exceeds that threshold, it only merges the overlapping files on demand.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

@v0y4g3r v0y4g3r marked this pull request as ready for review April 26, 2024 08:38
Copy link

codecov bot commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 99.67427% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 85.33%. Comparing base (4eadd9f) to head (504399b).
Report is 41 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3702      +/-   ##
==========================================
- Coverage   85.64%   85.33%   -0.32%     
==========================================
  Files         953      953              
  Lines      162835   163544     +709     
==========================================
+ Hits       139467   139553      +86     
- Misses      23368    23991     +623     

#[serde_as(as = "DisplayFromStr")]
pub max_active_window_files: usize,
pub max_active_window_runs: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a breaking change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks so. This change will fail the (not existing at present) compatibility test.

How about keeping the old name and only change the document?

src/mito2/src/engine/compaction_test.rs Show resolved Hide resolved
src/mito2/src/engine/compaction_test.rs Outdated Show resolved Hide resolved
src/mito2/src/engine/compaction_test.rs Outdated Show resolved Hide resolved
src/mito2/src/region/options.rs Outdated Show resolved Hide resolved
src/mito2/src/compaction/run.rs Show resolved Hide resolved
src/mito2/src/compaction/run.rs Outdated Show resolved Hide resolved
src/mito2/src/compaction/run.rs Outdated Show resolved Hide resolved
src/mito2/src/compaction/run.rs Outdated Show resolved Hide resolved
src/mito2/src/compaction/run.rs Outdated Show resolved Hide resolved
@v0y4g3r v0y4g3r requested a review from evenyag May 6, 2024 08:52
#[serde_as(as = "DisplayFromStr")]
pub max_active_window_files: usize,
pub max_active_window_runs: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks so. This change will fail the (not existing at present) compatibility test.

How about keeping the old name and only change the document?

@@ -159,7 +169,7 @@ async fn test_compaction_region_with_overlapping() {
let request = CreateRequestBuilder::new()
.insert_option("compaction.type", "twcs")
.insert_option("compaction.twcs.max_active_window_files", "2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this as well?

(and in line 271)

Comment on lines +232 to +236
let mut current_item = all_items[0].clone();

for item in all_items.into_iter().skip(1) {
if current_item.overlap(&item) {
current_item = current_item.merge(item);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
let mut current_item = all_items[0].clone();
for item in all_items.into_iter().skip(1) {
if current_item.overlap(&item) {
current_item = current_item.merge(item);
let mut res = SortedRun::default();
let mut all_items_iter = all_items.into_iter();
let mut current_item = all_items_iter.next().unwrap();
for item in all_items_iter {
if current_item.overlap(&item) {

reduce one clone

.flat_map(|r| r.items.into_iter())
.collect::<Vec<_>>();

all_items.sort_unstable_by(|l, r| l.start.cmp(&r.start).then(l.end.cmp(&r.end).reverse()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse sort_ranged_items above? I suppose all MergeItems are merged here and only contains one Item for each

res.push_item(std::mem::replace(&mut current_item, item));
}
}
res.push_item(current_item);
Copy link
Member

@waynexia waynexia May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be easier to understand

Suggested change
res.push_item(current_item);
res.push_item(current_item);
current_item = item;

Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@killme2008
Copy link
Contributor

What's the progress of this PR? @v0y4g3r

@killme2008
Copy link
Contributor

@v0y4g3r What's the status of this PR? It's deprecated or just wait for resolving conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-required This change requires docs update.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants