Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 5.4] repair: Introduce repair_partition_estimation_ratio config option #18658

Closed
wants to merge 1 commit into from

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented May 14, 2024

In commit 642f9a1 (repair: Improve estimated_partitions to reduce memory usage), a 10% hard coded estimation ratio is used.

This patch introduces a new config option to specify the estimation ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes #18615

(cherry picked from commit 340eae0)

Refs #18634

In commit 642f9a1 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes #18615

(cherry picked from commit 340eae0)
@mergify mergify bot requested review from tgrabiec and nyh as code owners May 14, 2024 02:42
@mergify mergify bot assigned asias May 14, 2024
@mykaul
Copy link
Contributor

mykaul commented May 16, 2024

compiling failure

repair/row_level.cc:2937:81: error: member access into incomplete type 'const db::config'

@mykaul
Copy link
Contributor

mykaul commented May 16, 2024

@asias - please check.

@mykaul
Copy link
Contributor

mykaul commented May 16, 2024

CC @yaronkaikov

@mykaul
Copy link
Contributor

mykaul commented May 20, 2024

@asias - are you looking into this?

@asias
Copy link
Contributor

asias commented May 21, 2024

@asias - are you looking into this?

Yes. PR is here #18780.

We need to backport #18780 together with 340eae0)

@denesb
Copy link
Contributor

denesb commented May 21, 2024

@asias - are you looking into this?

Yes. PR is here #18780.

We need to backport #18780 together with 340eae0)

This is overkill, lets just fix-up this backport, adding the include. Evidently, on master the header is included indirectly already, so not needed there.

@asias
Copy link
Contributor

asias commented May 21, 2024

@asias - are you looking into this?

Yes. PR is here #18780.
We need to backport #18780 together with 340eae0)

This is overkill, lets just fix-up this backport, adding the include. Evidently, on master the header is included indirectly already, so not needed there.

We'd better include it directly even in master.

@denesb
Copy link
Contributor

denesb commented May 21, 2024

We'd better include it directly even in master.

I don't see a benefit. Our policy is to include only headers, without which the file doesn't build. This is clearly not the case on master, it builds fine without the include.

@asias
Copy link
Contributor

asias commented May 21, 2024

We'd better include it directly even in master.

I don't see a benefit. Our policy is to include only headers, without which the file doesn't build. This is clearly not the case on master, it builds fine without the include.

It builds on master fine just by luck because other header happens to include the db/config.hh. If at some point, we drop the lucky header we need to add db/config.hh explicitly.

The benefit is that

  • include what it is used explicitly
  • we do not need to manually edit the backport for 10 different branches
  • the difference between master and backport branch is minimized.

@denesb
Copy link
Contributor

denesb commented May 21, 2024

We'd better include it directly even in master.

I don't see a benefit. Our policy is to include only headers, without which the file doesn't build. This is clearly not the case on master, it builds fine without the include.

It builds on master fine just by luck because other header happens to include the db/config.hh. If at some point, we drop the lucky header we need to add db/config.hh explicitly.

Yes, this is standard practice in our code-base. What makes this one headers so special that we want to make an exception?
If we think the other way is better (include everything directly), then we should do it for all headers, not just randomly.

The benefit is that

* include what it is used explicitly

* we do not need to manually edit the backport for 10 different branches

* the difference between master and backport branch is minimized.

@asias
Copy link
Contributor

asias commented May 21, 2024

We'd better include it directly even in master.

I don't see a benefit. Our policy is to include only headers, without which the file doesn't build. This is clearly not the case on master, it builds fine without the include.

It builds on master fine just by luck because other header happens to include the db/config.hh. If at some point, we drop the lucky header we need to add db/config.hh explicitly.

Yes, this is standard practice in our code-base. What makes this one headers so special that we want to make an exception? If we think the other way is better (include everything directly), then we should do it for all headers, not just randomly.

If a.cc needs x.hh, it is better to include x.hh directly in a.cc. Even if z.hh include x.hh and a.cc include z.hh, because nothing guarantees z.hh still include x.hh two weeks later. Our policy is not to minimize the number of include files.

@mykaul
Copy link
Contributor

mykaul commented May 21, 2024

We could try to run https://github.com/include-what-you-use/include-what-you-use on the source code. I've tried it once (on C code) and it worked fine - to an extent - it did not handle different platforms (BSD, etc.) that had include with ifdef per platform.

@denesb
Copy link
Contributor

denesb commented May 21, 2024

If a.cc needs x.hh, it is better to include x.hh directly in a.cc. Even if z.hh include x.hh and a.cc include z.hh, because nothing guarantees z.hh still include x.hh two weeks later. Our policy is not to minimize the number of include files.
We could try to run https://github.com/include-what-you-use/include-what-you-use on the source code. I've tried it once (on C code) and it worked fine - to an extent - it did not handle different platforms (BSD, etc.) that had include with ifdef per platform.

Include what you use is a good coding practice, but we don't follow it and a random backport PR is not the appropriate place to suggest using it, instead of what we do currently.

We currently use indirect includes, which is also a valid method and the fact that z.hh might not include x.hh tomorrow anymore is not a reason to not use it. When that happens there will be a compiler error, and whoever removes x.hh from z.hh will have to deal with the fallout. This is what we do today. Lets not start a revolution here.

If you guys want to suggest we start using include-what-you-use, please send a patch to the coding style and lets get consensus on it first.

@asias
Copy link
Contributor

asias commented May 21, 2024

If a.cc needs x.hh, it is better to include x.hh directly in a.cc. Even if z.hh include x.hh and a.cc include z.hh, because nothing guarantees z.hh still include x.hh two weeks later. Our policy is not to minimize the number of include files.
We could try to run https://github.com/include-what-you-use/include-what-you-use on the source code. I've tried it once (on C code) and it worked fine - to an extent - it did not handle different platforms (BSD, etc.) that had include with ifdef per platform.

Include what you use is a good coding practice, but we don't follow it and a random backport PR is not the appropriate place to suggest using it, instead of what we do currently.

We currently use indirect includes, which is also a valid method and the fact that z.hh might not include x.hh tomorrow anymore is not a reason to not use it. When that happens there will be a compiler error, and whoever removes x.hh from z.hh will have to deal with the fallout. This is what we do today. Lets not start a revolution here.

I also mentioned other benefits here #18658 (comment).

I tried my best to include what .cc uses. The whole thing here demos why it is better to do this way. If I did not miss the missing db/config.hh the automatic backport will just work. There will be no such discussion at all.

The PR #18780 fixes the issue and makes thing in the right direction IMO.

If you guys want to suggest we start using include-what-you-use, please send a patch to the coding style and lets get consensus on it first.

There are no docs in scylla.git that forces people to prefer indirect include as the coding practice. This means people should use common good coding practice.

@denesb
Copy link
Contributor

denesb commented May 21, 2024

Putting aside the bike-shedding on what to include... #18780 is not backport material. It is not fixing a bug. branch-5.4 builds fine and passes the tests without it. So no maintainer will backport that.
You have to fix the missing header here, #18780 is completely orthogonal as it only affects master and it won't be backported.

@asias
Copy link
Contributor

asias commented May 22, 2024

Putting aside the bike-shedding on what to include... #18780 is not backport material. It is not fixing a bug. branch-5.4 builds fine and passes the tests without it. So no maintainer will backport that. You have to fix the missing header here, #18780 is completely orthogonal as it only affects master and it won't be backported.

I would say the backport for the function of "repair: Introduce repair_partition_estimation_ratio config option" consists two commits 340eae0 and #18780 from master. The are not orthogonal.

Instead of manual edit to achieve what #18780 does exactly in the backport for 5.4, IMHO it is better to do it WITHOUT manual editing, for branch 5.4 and more branches that we want backport to.

If there are 10 backport branches, are you going to manual edit 10 times. I do not see the point why you want to do the manual editing for the backport.

@denesb
Copy link
Contributor

denesb commented May 22, 2024

Putting aside the bike-shedding on what to include... #18780 is not backport material. It is not fixing a bug. branch-5.4 builds fine and passes the tests without it. So no maintainer will backport that. You have to fix the missing header here, #18780 is completely orthogonal as it only affects master and it won't be backported.

I would say the backport for the function of "repair: Introduce repair_partition_estimation_ratio config option" consists two commits 340eae0 and #18780 from master. The are not orthogonal.

I disagree, but I will merge said PR so you can cherry-pick it into this PR, so we can move forward.

Instead of manual edit to achieve what #18780 does exactly in the backport for 5.4, IMHO it is better to do it WITHOUT manual editing, for branch 5.4 and more branches that we want backport to.

According to my definition of simple, editing the backport commit is simpler than cherry-picking another commit, then adjust the cover letter to explain why further patches are needed.

If there are 10 backport branches, are you going to manual edit 10 times. I do not see the point why you want to do the manual editing for the backport.

Yes, we often do this. Sometimes the code almost has to be rewritten from scratch, because the versions are so different. Such is life.

@asias
Copy link
Contributor

asias commented May 27, 2024

Closed in favor of #18881.

@asias asias closed this May 27, 2024
@mergify mergify bot deleted the mergify/copy/branch-5.4/pr-18634 branch May 27, 2024 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants