-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid External Compaction Progress Fix #4452
base: 2.1
Are you sure you want to change the base?
Conversation
This fixes the error with external compactions reporting invalid (>100%) progress when the compactions include bulk imported files. Also adds a test to ExternalCompactionProgressIT to ensure that this fix works as intended.
@kevinrr888 when I wrote #4422 I had confirmed that the progress counts in the compaction-coordinator log files were also incorrect values which matched the monitor output because I wasn't sure if it was just a monitor specific display problem. So if the log files report the correct progress I wouldn't worry about the monitor output. |
Testing the monitor may be possible by pulling json from it, but not sure. Did you manually try looking at the monitor with this change? Personally I think a manual check would be good. |
@@ -1555,5 +1561,42 @@ private void setInterruptFlagInternal(AtomicBoolean flag) { | |||
public void setCacheProvider(CacheProvider cacheProvider) { | |||
reader.setCacheProvider(cacheProvider); | |||
} | |||
|
|||
/** | |||
* Returns an estimate of the number of entries that overlap the given extent. This is an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could move this javadoc up to FileSKVIterator.estimateEntries(). Its nice javadoc, putting it there would make it more widely visible.
String tableName1 = tableNames[0]; | ||
String tableName2 = tableNames[1]; | ||
|
||
try (AccumuloClient client = Accumulo.newClient().from(getClientProps()).build()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test fail w/o the new changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, would fail at the verifyProgress()
check. The progress would go up to 200% prior to the changes, which is what was expected.
String dir = getDir(client, tableName1); | ||
|
||
log.info("Bulk importing files in dir " + dir + " to table " + tableName2); | ||
client.tableOperations().importDirectory(dir).to(tableName2).load(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intent of this? It seems like it will bulk import files from a dir in table 1 into table 2. Bulk import moves files so that would delete files from table1, but table1 would still have refs to those files in the metadata table. So seems like it will leave table1 broken. Not a change for this PR, but thinking we should update bulk import to throw an exception if the bulk source dir is under a configured accumulo volume dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it bulk imports the files from a dir in table1 to table2. The intent was just to get a directory with RFiles with some data, thought just creating a table with some data would be an easy way to do this. That was the sole purpose of table1 in this test and is not used after the import. If this is not intended to be allowed, I can create an issue for it.
log.info("Attaching a slow iterator to table " + tableName2); | ||
IteratorSetting setting = new IteratorSetting(50, "Slow", SlowIterator.class); | ||
SlowIterator.setSleepTime(setting, 1); | ||
client.tableOperations().attachIterator(tableName2, setting, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding the iterator to the table could set it on the CompactionConfig object. Setting it on the table is eventually consistent, so its not certain the compaction will see the iterator. If the iterator is set on the compaction config, then it will be seen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. There are other places where this is done as well--setting it on the table and then compacting. I can fix these in another PR, or in this PR if you think this should be done.
Okay, had just tried my test using uno again with an even larger compaction and now I can see that the progress does in fact go above 100% prior to these changes. Perhaps my compaction wasn't large enough before to show any progress, I'm not sure. Will build uno using my changes and check the monitor page and get back to you. |
Just confirmed that these changes are reflected in the monitor page. So the logs and the monitor page both no longer show >100% progress |
added break, moved javadoc, iterator now set on the compaction config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this, I see no issues but I think we should have @keith-turner review again. It looks like you have answered some of his questions, I just want to make sure that he sees that and concurs. Have you run any other ITs?
Sounds good. I have only run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified that the test fails before these changes are applied and passes afterwards.
* @param extent the key extent | ||
* @return the estimate | ||
*/ | ||
long estimateEntries(KeyExtent extent) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would including the word overlap
or overlapping
in the method name make it more self-evident? Maybe something like estimateOverlappingEntries
?
} else { | ||
progressList.add(EC_PROGRESS.INVALID); | ||
log.warn("An invalid progress {} has been seen. This should never occur.", | ||
rci.progress); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to either fail the test here instead of adding it to the list (quicker runtime), or where all the progress states are checked at the end, check them all by using Junits assertAll()
that way it will alert us if multiple asserts fail instead of just stop at the first failure (more coverage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use assertAll()
. Could have it fail for that specific check there, but we already need to add to a list for the other checks (or at least some way to signal that those progresses have been seen), so I think it is more consistent to just add INVALID to the list. This way also won't end the progress early, which may be better for debugging to see how high the progress goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I agree. I like checking them all instead of failing early in this case.
closes #4422
This fixes the error with external compactions reporting invalid (>100%) progress when the compactions include bulk imported files. This occurs because the estimated number of entries for bulk imported files is 0. Following the suggestion of Keith, instead of modifying the bulk import code to compute the estimated entries, I modified the Compactor code to compute the estimated number of entries on files with 0 entries. This value is computed by opening the RFile and summing the number of entries in each of the IndexEntry's which overlap with the extent of the external compaction.
Also added a test to ExternalCompactionProgressIT to ensure that this fix works as intended. This tests that the Compactor and CompactionCoordinator report a valid progress.
I unfortunately was not able to ensure that these changes are also reflected in the Accumulo Monitor page. Large compactions would only briefly show up in the Running Compactions table, always with a progress of 0. I also could not use something like a SlowIterator to slow the progress down (I believe since SlowIterator is part of a test package, it isn't available). If anyone is able to ensure that these changes are reflected in the Monitor page, or is able to provide guidance on how I can better test this against the Monitor page, that would be helpful. The environment I used to run an external compaction was jshell using uno.