Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhosted user tablets prevent balancing of metadata table #4515

Open
keith-turner opened this issue May 1, 2024 · 2 comments
Open

Unhosted user tablets prevent balancing of metadata table #4515

keith-turner opened this issue May 1, 2024 · 2 comments
Labels
bug This issue has been verified to be a bug.

Comments

@keith-turner
Copy link
Contributor

keith-turner commented May 1, 2024

Describe the bug

When starting accumulo and assigning lots of tablets if the metadata table is not initially balanced then it will be prevented from balancing. This can leave the metadata tablets on much less tablet servers than possible at a time when the system is busy assigning and loading user tablets which generates a lot of load on the metadata table.

Overall this seems to be caused by the fact that the code related to balancing does not consider the different levels of Accumulo.

#4475 is related to this larger problem. The specific code that prevents balancing is here, but fixing this issue would be a larger change that just that code.

This problem is present in 2.1 and later.

Expected behavior

The manager can balance the metadata table independently of what is going on with user tables.

@keith-turner keith-turner added the bug This issue has been verified to be a bug. label May 1, 2024
@EdColeman
Copy link
Contributor

Is there any chance that you started the manager before the tservers (or at least most of them)? If you start the tservers first, they will sit there waiting for assignments. When the manager starts, it will assign the metadata table before user tables and usually seemed to get distributed as it was on shutdown.

If you start the manager first, then as the tservers start, the manager will immediate begin assignments as soon as it sees the first tserver. This usually ends up with the metadata and a large number of tablets assigned to one (or very few) tservers - the rebalancing then will take a long time before things get back to normal.

There is a property MANAGER_STARTUP_TSERVER_AVAIL_MIN_COUNT to wait for N tservers before assignments start can mitigate this.

Balancing system tables separately would be a good feature, but there may be procedural things that can be done without code changes that help mitigate the issue from occurring.

@keith-turner
Copy link
Contributor Author

Is there any chance that you started the manager before the tservers (or at least most of them)?

Started all tablet servers first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue has been verified to be a bug.
Projects
None yet
Development

No branches or pull requests

2 participants