Move Scan Server File refs to their own table #4529

ddanielr · 2024-05-06T16:34:03Z

Is your feature request related to a problem? Please describe.

Scan server file references contain a file path and the uuid of the scan server for the GC to determine if a deletion candidate is still in use.
So, as scan servers are scaled for user read activity, the number of metadata table entries also increases.

This creates additional load on the metadata tservers for what could be a short duration action (i.e. scaling in response to high client reads) and could cause the need for additional tservers to be started for hosting metadata tablets.
Once those tservers are up, they need to balance the metadata tablets across the new set of tservers.
This could impact other system actions that rely on quick metadata read and writes as well as load balancing settings on the tservers (host regex, split point calculation, etc).

Likewise, once scan servers are scaled in, the number of tservers would also be scaled in to conserve compute resources.
This scale-in action would cause a number of hosting related operations to occur on the metadata table, which could impact system performance.

Describe the solution you'd like

Once #4528 is complete, then the scan servers file refs do not need to be managed for the root or metadata tables.
At that point, they should be moved off to their own table to completely separate client read behavior from impacting the data hosting tservers (metadata table).

Auto scaling actions can then be constrained to a client's resource group, vs creating hosting churn on the metadata table which affects all users.
Likewise, it further isolates critical system information (tablet hosting, rfile fencing, etc) from transitory information (scan server references, problem reports, blip markers).

If accumulo was being deployed via a helm chart, this also allows tservers to be spun up automatically in a scan server resource group to host the scan server references for that collection of scan servers.

The GC would need to scan this table instead, however because it's a now a full table, its easier to validate that all references were found.

Describe alternatives you've considered

Thought about managing the metadata table with auto created splits, but the scale-in operations always cause metadata interaction impact.

Additional context
The scan server prefix could be dropped in favor of the scan server resource group.
This could help to increase the rate which the manager removes old scan server references. Allowing a full resource group to be removed as opposed to having to check each individual uuid.

For a helm chart deployment, the scan server's resource group could also be used to balance a given resource group's scan server references on that same resource group's tserver.

EdColeman · 2024-05-06T19:20:43Z

elasticity added at table for FATEs. Maybe we would want a more general utility table instead of creating specific tables for each use? Later, if performance showed that there is contention on the utility table, it could be then split into separate tables?

Using fewer tables may lead to easier management? For example, if designated servers are used for the metadata table, it may be desirable to collocate the utility table(s) to the same servers. This could come at the expense of more contention, but generally, the individual tables should be small relative to data tables, so there might not be a measurable impact if it was one table.

If a utility table is used, it may help if the design allowed for easy spiting by using variables / properties for the table name instead of them being hard coded?

dlmarion · 2024-05-07T19:40:04Z

Is this related to, similar to, or a duplicate of, #4493 ?

ddanielr · 2024-05-08T14:01:02Z

Is this related to, similar to, or a duplicate of, #4493 ?

It's related to #4493.

We tested with the changes in #4510 and it increased scan server performance.
Old Ref:
~sservfile:/data/workspace/accumulo/test/target/mini-tests/
New Ref:
~sserv5e6087acfile:/data/workspace/accumulo/test/target/

For this, it makes sense to drop the ~sserv prefix.
Proposed Ref:
5e6087acfile:/data/workspace/accumulo/test/target/

We should probably figure out a way to split the scan server refs based on resource group name.
This would accomplish a couple of things.

Allow tserver hosting resources to be dedicated to particular clients (scale a single resource group of scan servers and tservers)
Scan server code would only need to scan for references in their relevant section of the table.
The manager cleanup process for scan server refs could look at resource groups first and delete on that vs comparing each scan server UUID

cshannon · 2024-05-08T14:02:32Z

I added the new FATE table in elasticity so I could work on this (it looks like targeting 3.1 ?) One thing is it looks like #4528 is still going to allow eventual scans by a system user so now sure how that impacts moving these refs to their own table.

Others can comment too but I don't think we need/want a general utility table. To me it seems like we already have a utility table...it's the metadata table. Anything that has a low amount of writes/contention can just be stored as metadata. The purpose of this new table is specifically there's too much contention on scan file refs so it makes sense that it's put into its own table and it can be split to optimize scan refs.

Same for keeping FATE separate in elasticity, I think it should be its own table to handle the volume of FATE ops that will be happening.

ddanielr · 2024-05-08T14:06:04Z

So the reason #4528 was required was because we didn't want to have scan server refs end up on the root table for hitting the metadata table with an eventual scan.

However, if all the scan server refs are moved to a separate table, then it doesn't matter as GC actions against the metadata and root table will still all look at the same scan server ref table.

ddanielr added the enhancement This issue describes a new feature, improvement, or optimization. label May 6, 2024

ddanielr added this to To do in 3.1.0 via automation May 6, 2024

cshannon self-assigned this May 8, 2024

ctubbsii mentioned this issue May 8, 2024

Consider/experiment with avoiding scan server refs by delaying GC #4538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move Scan Server File refs to their own table #4529

Move Scan Server File refs to their own table #4529

ddanielr commented May 6, 2024

EdColeman commented May 6, 2024

dlmarion commented May 7, 2024

ddanielr commented May 8, 2024

cshannon commented May 8, 2024

ddanielr commented May 8, 2024

Move Scan Server File refs to their own table #4529

Move Scan Server File refs to their own table #4529

Comments

ddanielr commented May 6, 2024

EdColeman commented May 6, 2024

dlmarion commented May 7, 2024

ddanielr commented May 8, 2024

cshannon commented May 8, 2024

ddanielr commented May 8, 2024