Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid External Compaction Progress Fix #4452

Open
wants to merge 4 commits into
base: 2.1
Choose a base branch
from

Conversation

kevinrr888
Copy link
Contributor

closes #4422

This fixes the error with external compactions reporting invalid (>100%) progress when the compactions include bulk imported files. This occurs because the estimated number of entries for bulk imported files is 0. Following the suggestion of Keith, instead of modifying the bulk import code to compute the estimated entries, I modified the Compactor code to compute the estimated number of entries on files with 0 entries. This value is computed by opening the RFile and summing the number of entries in each of the IndexEntry's which overlap with the extent of the external compaction.

Also added a test to ExternalCompactionProgressIT to ensure that this fix works as intended. This tests that the Compactor and CompactionCoordinator report a valid progress.

I unfortunately was not able to ensure that these changes are also reflected in the Accumulo Monitor page. Large compactions would only briefly show up in the Running Compactions table, always with a progress of 0. I also could not use something like a SlowIterator to slow the progress down (I believe since SlowIterator is part of a test package, it isn't available). If anyone is able to ensure that these changes are reflected in the Monitor page, or is able to provide guidance on how I can better test this against the Monitor page, that would be helpful. The environment I used to run an external compaction was jshell using uno.

This fixes the error with external compactions reporting invalid (>100%) progress when the compactions include bulk imported files. Also adds a test to ExternalCompactionProgressIT to ensure that this fix works as intended.
@ddanielr
Copy link
Contributor

ddanielr commented Apr 12, 2024

@kevinrr888 when I wrote #4422 I had confirmed that the progress counts in the compaction-coordinator log files were also incorrect values which matched the monitor output because I wasn't sure if it was just a monitor specific display problem.

So if the log files report the correct progress I wouldn't worry about the monitor output.

@keith-turner
Copy link
Contributor

keith-turner commented Apr 17, 2024

Testing the monitor may be possible by pulling json from it, but not sure. Did you manually try looking at the monitor with this change? Personally I think a manual check would be good.

@@ -1555,5 +1561,42 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
public void setCacheProvider(CacheProvider cacheProvider) {
reader.setCacheProvider(cacheProvider);
}

/**
* Returns an estimate of the number of entries that overlap the given extent. This is an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move this javadoc up to FileSKVIterator.estimateEntries(). Its nice javadoc, putting it there would make it more widely visible.

String tableName1 = tableNames[0];
String tableName2 = tableNames[1];

try (AccumuloClient client = Accumulo.newClient().from(getClientProps()).build()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test fail w/o the new changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, would fail at the verifyProgress() check. The progress would go up to 200% prior to the changes, which is what was expected.

String dir = getDir(client, tableName1);

log.info("Bulk importing files in dir " + dir + " to table " + tableName2);
client.tableOperations().importDirectory(dir).to(tableName2).load();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intent of this? It seems like it will bulk import files from a dir in table 1 into table 2. Bulk import moves files so that would delete files from table1, but table1 would still have refs to those files in the metadata table. So seems like it will leave table1 broken. Not a change for this PR, but thinking we should update bulk import to throw an exception if the bulk source dir is under a configured accumulo volume dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it bulk imports the files from a dir in table1 to table2. The intent was just to get a directory with RFiles with some data, thought just creating a table with some data would be an easy way to do this. That was the sole purpose of table1 in this test and is not used after the import. If this is not intended to be allowed, I can create an issue for it.

log.info("Attaching a slow iterator to table " + tableName2);
IteratorSetting setting = new IteratorSetting(50, "Slow", SlowIterator.class);
SlowIterator.setSleepTime(setting, 1);
client.tableOperations().attachIterator(tableName2, setting,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding the iterator to the table could set it on the CompactionConfig object. Setting it on the table is eventually consistent, so its not certain the compaction will see the iterator. If the iterator is set on the compaction config, then it will be seen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. There are other places where this is done as well--setting it on the table and then compacting. I can fix these in another PR, or in this PR if you think this should be done.

@kevinrr888
Copy link
Contributor Author

Testing the monitor may be possible by pulling json from it, but not sure. Did you manually try looking at the monitor with this change? Personally I think a manual check would be good.

Okay, had just tried my test using uno again with an even larger compaction and now I can see that the progress does in fact go above 100% prior to these changes. Perhaps my compaction wasn't large enough before to show any progress, I'm not sure. Will build uno using my changes and check the monitor page and get back to you.

@kevinrr888
Copy link
Contributor Author

Just confirmed that these changes are reflected in the monitor page. So the logs and the monitor page both no longer show >100% progress

added break, moved javadoc, iterator now set on the compaction config
Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this, I see no issues but I think we should have @keith-turner review again. It looks like you have answered some of his questions, I just want to make sure that he sees that and concurs. Have you run any other ITs?

@kevinrr888
Copy link
Contributor Author

I reviewed this, I see no issues but I think we should have @keith-turner review again. It looks like you have answered some of his questions, I just want to make sure that he sees that and concurs. Have you run any other ITs?

Sounds good. I have only run ExternalCompactionProgressIT.java and those that github runs. Let me know if there are others you think that should also be verified.

Copy link
Member

@DomGarguilo DomGarguilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified that the test fails before these changes are applied and passes afterwards.

* @param extent the key extent
* @return the estimate
*/
long estimateEntries(KeyExtent extent) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would including the word overlap or overlapping in the method name make it more self-evident? Maybe something like estimateOverlappingEntries?

Comment on lines +233 to +236
} else {
progressList.add(EC_PROGRESS.INVALID);
log.warn("An invalid progress {} has been seen. This should never occur.",
rci.progress);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to either fail the test here instead of adding it to the list (quicker runtime), or where all the progress states are checked at the end, check them all by using Junits assertAll() that way it will alert us if multiple asserts fail instead of just stop at the first failure (more coverage).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use assertAll(). Could have it fail for that specific check there, but we already need to add to a list for the other checks (or at least some way to signal that those progresses have been seen), so I think it is more consistent to just add INVALID to the list. This way also won't end the progress early, which may be better for debugging to see how high the progress goes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I agree. I like checking them all instead of failing early in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants