Allow system compactions to run if zero user compaction jobs have run #4480

cshannon · 2024-04-20T18:00:13Z

This change will allow system compactions to postpone user compactions that have had no jobs run yet. Before this, if a user compaction was in the queue and had selected files that overlapped it would block system compactions from running. Now if there are selected files, but the user compaction is not running and hasn't had any jobs completed, the coordinator will clear the selectedFiles column so that the system compaction can run. The fate operation will reset the column again while trying to make progress.

The current test works by creating two tables and setting compactions to be slow using the slow iterator and also setting system compactions to be disabled. The test will start compacting one table so the compactor is busy. Next a user compaction is started that will waiting in the queue and then the system compactions are re-enabled so a system compaction will be scheduled. A custom test planner is set so that system compactions take higher precedence. This means that when the compation coordinator goes to start the next compaction job, it will try and grab the system compaction ahead of the user compaction. Before this change this would have been blocked by the selectedFiles column being set by the user compaction, however because the user compaction has had no jobs run it can proceed and clear the column and run the system compaction. Later the fate operation will re-set it again and the user compaction will run second.

This closes #4454

This change will allow system compactions to postpone user compactions that have had no jobs run yet. Before this, if a user compaction was in the queue and had selected files that overlapped it would block system compactions from running. Now if there are selected files, but the user compaction is not running and hasn't had any jobs completed, the coordinator will clear the selectedFiles column so that the system compaction can run. The fate operation will reset the column again while trying to make progress. This closes apache#4454

cshannon · 2024-04-28T15:31:30Z

I started to work on a follow on to this PR in another branch that is based on the SteadyTime changes from #4494 and is here: cshannon@1997703

It's not tested and just a rough draft of how we could possibly use SteadyTime. I was unsure about a couple of things, first the commit updates CompactionJobGenerator but I'm not sure if we need to also update CompactionCoordinator to use SteadyTime when dealing with selected files. Second, I wasn't sure if the expiration should be an independent check or tied to if compaction jobs have run (ie only expire if compaction jobs run is greater than 0

cshannon · 2024-05-10T21:43:18Z

I started to work on a follow on to this PR in another branch that is based on the SteadyTime changes from #4494 and is here: cshannon@1997703

It's not tested and just a rough draft of how we could possibly use SteadyTime. I was unsure about a couple of things, first the commit updates CompactionJobGenerator but I'm not sure if we need to also update CompactionCoordinator to use SteadyTime when dealing with selected files. Second, I wasn't sure if the expiration should be an independent check or tied to if compaction jobs have run (ie only expire if compaction jobs run is greater than 0

I merged in these changes after SteadyTime was merged into elasticity, the same comments/questions apply

keith-turner

Still looking at this, posting some comments I have made so far.

It's not tested and just a rough draft of how we could possibly use SteadyTime. I was unsure about a couple of things, first the commit updates CompactionJobGenerator but I'm not sure if we need to also update CompactionCoordinator to use SteadyTime when dealing with selected files. Second, I wasn't sure if the expiration should be an independent check or tied to if compaction jobs have run (ie only expire if compaction jobs run is greater than 0

CompactionCoordinator should probably use steady time, that would cover the following situation.

CompactionJobGenerator generates a system compaction based on selected files that are past expiration
The coordinator gets this job for reservation and comes to the conclusion that it can reserve the files in the job even though selected because they are passed expiration

I made some suggestions around expired selected files in the coordinator.

core/src/main/java/org/apache/accumulo/core/metadata/schema/SelectedFiles.java

server/base/src/main/java/org/apache/accumulo/server/compaction/CompactionJobGenerator.java

.../src/main/java/org/apache/accumulo/manager/compaction/coordinator/CompactionCoordinator.java

server/base/src/main/java/org/apache/accumulo/server/compaction/CompactionJobGenerator.java

approved by mistake

.../src/main/java/org/apache/accumulo/manager/compaction/coordinator/CompactionCoordinator.java

keith-turner · 2024-05-17T21:05:11Z

server/manager/src/main/java/org/apache/accumulo/manager/tableOps/compact/CompactionDriver.java

            selectionsSubmitted.put(tablet.getExtent(), filesToCompact);

+            // TODO: Do we need to handle a race condition for the rejection handler check


Re this comment. Could relax the check in the rejection handler to tabletMetadata.getSelectedFiles().getFateId().equals(fateId) || tabletMetadata.getCompacted().contains(fateId) if either of those are true then write when through, but could have changed since to compact some or even completed.

Alright so the full rejection handler should just look like the following? Just want to make sure i have it right.

mutator.submit(tabletMetadata -> tabletMetadata.getSelectedFiles() != null && tabletMetadata.getSelectedFiles().getFateId().equals(fateId) || tabletMetadata.getCompacted().contains(fateId));

test/src/main/java/org/apache/accumulo/test/functional/CompactionIT.java

server/manager/src/main/java/org/apache/accumulo/manager/tableOps/compact/CompactionDriver.java

cshannon requested a review from keith-turner April 20, 2024 18:00

cshannon self-assigned this Apr 20, 2024

cshannon added 3 commits April 26, 2024 07:19

Merge branch 'elasticity' into accumulo-4454

1a0568f

Merge branch 'elasticity' into accumulo-4454

2ea2854

Merge branch 'elasticity' into accumulo-4454

027a033

cshannon added 4 commits May 3, 2024 18:24

Merge branch 'elasticity' into accumulo-4454

4d30418

Merge branch 'elasticity' into accumulo-4454

b58b44e

Use SteadyTime to decide when to clear selected files

0cffeb2

Fix issues after latest merges

d749aec

cshannon added 2 commits May 10, 2024 17:55

Fix AmpleConditionalWriterIT

eec7854

Fix SelectedFilesTest

5e6d9d8

keith-turner reviewed May 13, 2024

View reviewed changes

cshannon added 6 commits May 17, 2024 09:14

Merge branch 'elasticity' into accumulo-4454

442d7c5

Merge branch 'elasticity' into accumulo-4454

aa46a2a

Fix logic in CompactionJobGenerator and improve serialization

eee8567

Fix SelectedFilesTest

7fc092c

Add SteadyTime to CompactionCoordinator

dafdc87

Add comment

cf7716d

cshannon marked this pull request as ready for review May 17, 2024 18:56

keith-turner previously approved these changes May 17, 2024

View reviewed changes

server/base/src/main/java/org/apache/accumulo/server/compaction/CompactionJobGenerator.java Outdated Show resolved Hide resolved

cshannon requested a review from keith-turner May 17, 2024 20:15

keith-turner reviewed May 17, 2024

View reviewed changes

cshannon added 2 commits May 17, 2024 17:19

Improve validation in SelectedFiles constructor

24c2590

Fix formatting

c4e203e

keith-turner approved these changes May 17, 2024

View reviewed changes

Address comments and update tests

197dbf3

cshannon merged commit 9d4dc21 into apache:elasticity May 18, 2024
8 checks passed

cshannon deleted the accumulo-4454 branch May 18, 2024 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow system compactions to run if zero user compaction jobs have run #4480

Allow system compactions to run if zero user compaction jobs have run #4480

cshannon commented Apr 20, 2024 •

edited

cshannon commented Apr 28, 2024

cshannon commented May 10, 2024

keith-turner left a comment

keith-turner May 17, 2024

cshannon May 18, 2024

		selectionsSubmitted.put(tablet.getExtent(), filesToCompact);

		// TODO: Do we need to handle a race condition for the rejection handler check

Allow system compactions to run if zero user compaction jobs have run #4480

Allow system compactions to run if zero user compaction jobs have run #4480

Conversation

cshannon commented Apr 20, 2024 • edited

cshannon commented Apr 28, 2024

cshannon commented May 10, 2024

keith-turner left a comment

Choose a reason for hiding this comment

keith-turner May 17, 2024

Choose a reason for hiding this comment

cshannon May 18, 2024

Choose a reason for hiding this comment

cshannon commented Apr 20, 2024 •

edited