Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow system compactions to run if zero user compaction jobs have run #4480

Merged
merged 19 commits into from May 18, 2024

Conversation

cshannon
Copy link
Contributor

@cshannon cshannon commented Apr 20, 2024

This change will allow system compactions to postpone user compactions that have had no jobs run yet. Before this, if a user compaction was in the queue and had selected files that overlapped it would block system compactions from running. Now if there are selected files, but the user compaction is not running and hasn't had any jobs completed, the coordinator will clear the selectedFiles column so that the system compaction can run. The fate operation will reset the column again while trying to make progress.

The current test works by creating two tables and setting compactions to be slow using the slow iterator and also setting system compactions to be disabled. The test will start compacting one table so the compactor is busy. Next a user compaction is started that will waiting in the queue and then the system compactions are re-enabled so a system compaction will be scheduled. A custom test planner is set so that system compactions take higher precedence. This means that when the compation coordinator goes to start the next compaction job, it will try and grab the system compaction ahead of the user compaction. Before this change this would have been blocked by the selectedFiles column being set by the user compaction, however because the user compaction has had no jobs run it can proceed and clear the column and run the system compaction. Later the fate operation will re-set it again and the user compaction will run second.

This closes #4454

This change will allow system compactions to postpone user compactions
that have had no jobs run yet. Before this, if a user compaction was in
the queue and had selected files that overlapped it would block system
compactions from running. Now if there are selected files, but the user
compaction is not running and hasn't had any jobs completed, the
coordinator will clear the selectedFiles column so that the system
compaction can run. The fate operation will reset the column again while
trying to make progress.

This closes apache#4454
@cshannon cshannon self-assigned this Apr 20, 2024
@cshannon
Copy link
Contributor Author

I started to work on a follow on to this PR in another branch that is based on the SteadyTime changes from #4494 and is here: cshannon@1997703

It's not tested and just a rough draft of how we could possibly use SteadyTime. I was unsure about a couple of things, first the commit updates CompactionJobGenerator but I'm not sure if we need to also update CompactionCoordinator to use SteadyTime when dealing with selected files. Second, I wasn't sure if the expiration should be an independent check or tied to if compaction jobs have run (ie only expire if compaction jobs run is greater than 0

@cshannon
Copy link
Contributor Author

I started to work on a follow on to this PR in another branch that is based on the SteadyTime changes from #4494 and is here: cshannon@1997703

It's not tested and just a rough draft of how we could possibly use SteadyTime. I was unsure about a couple of things, first the commit updates CompactionJobGenerator but I'm not sure if we need to also update CompactionCoordinator to use SteadyTime when dealing with selected files. Second, I wasn't sure if the expiration should be an independent check or tied to if compaction jobs have run (ie only expire if compaction jobs run is greater than 0

I merged in these changes after SteadyTime was merged into elasticity, the same comments/questions apply

Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking at this, posting some comments I have made so far.

It's not tested and just a rough draft of how we could possibly use SteadyTime. I was unsure about a couple of things, first the commit updates CompactionJobGenerator but I'm not sure if we need to also update CompactionCoordinator to use SteadyTime when dealing with selected files. Second, I wasn't sure if the expiration should be an independent check or tied to if compaction jobs have run (ie only expire if compaction jobs run is greater than 0

CompactionCoordinator should probably use steady time, that would cover the following situation.

  • CompactionJobGenerator generates a system compaction based on selected files that are past expiration
  • The coordinator gets this job for reservation and comes to the conclusion that it can reserve the files in the job even though selected because they are passed expiration

I made some suggestions around expired selected files in the coordinator.

@cshannon cshannon marked this pull request as ready for review May 17, 2024 18:56
@cshannon cshannon dismissed keith-turner’s stale review May 17, 2024 20:16

approved by mistake

selectionsSubmitted.put(tablet.getExtent(), filesToCompact);

// TODO: Do we need to handle a race condition for the rejection handler check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re this comment. Could relax the check in the rejection handler to tabletMetadata.getSelectedFiles().getFateId().equals(fateId) || tabletMetadata.getCompacted().contains(fateId) if either of those are true then write when through, but could have changed since to compact some or even completed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright so the full rejection handler should just look like the following? Just want to make sure i have it right.

mutator.submit(tabletMetadata -> tabletMetadata.getSelectedFiles() != null
  && tabletMetadata.getSelectedFiles().getFateId().equals(fateId) ||
  tabletMetadata.getCompacted().contains(fateId));

@cshannon cshannon merged commit 9d4dc21 into apache:elasticity May 18, 2024
8 checks passed
@cshannon cshannon deleted the accumulo-4454 branch May 18, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants