vmagent: support multiple groups of remoteWrite.urls to support directing samples to different groups #6212

plangdale-roblox · 2024-04-30T23:17:45Z

Is your feature request related to a problem? Please describe

This request builds on top of #6054.

In our current fault-domain-less topology, we currently configure our vmagents with multiple remoteWrite.urls and use relabelling configs to decide which url to send samples to. This has the effect of allowing us to use a shared set of vmagents in front of multiple storage clusters.

The work done to address #6054 will require us to define one cluster per fault domain where we currently run a single cluster. In these situations we would like to continue using our shared vmagents. Based on my reading of 8f535f9, this is not currently possible, as all the remoteWrite.urls are treated as a single group.

That means that if we tried to pass the remoteWrite.urls for multiple clusters, it would still treat them all as one storage group and apply the replication factor across all of them, which is not the intended result.

Describe the solution you'd like

To be able to implement this scenario, we will need to be able to direct samples to distinct sets of remoteWrite.urls. (ie: directing them to identifiable storage groups).

At a high level, this would mean being able to assign remoteWrite.urls to identifiable groups, and then have the relabelling configs be mapped to specific groups.

Although not strictly required, it would also make sense to specify the replication factor separately for each group (vs having a single global setting applied to all groups).

Describe alternatives you've considered

The only alternative we have today is we would need to run dedicated sets of vmagents in front of each of these storage groups, and our agents would need to be aware of which set of collection agents (not always vmagent) to send which samples to. Currently, the information about where samples go is fully encapsulated in the configuration of the shared vmagent tier. If we moved to dedicated vmagent sets, we would need to size them independently, as well as configure certain external services that push samples to us to become aware of which vmagent set is for which metrics. This would significantly complicate our overall configuration.

Additional information

No response

The text was updated successfully, but these errors were encountered:

hagen1778 · 2024-05-06T13:01:14Z

The only alternative we have today is we would need to run dedicated sets of vmagents in front of each of these storage groups, and our agents would need to be aware of which set of collection agents (not always vmagent) to send which samples to.

If I understand it correctly, the alternative would look like the following:

A shared set of vmagents L1 which receives all the writes from all other agents/services. The L1 is responsible for routing the data stream via relabeling rules.
A fault-domain set of vmagents L2 which receives writes from L1 vmagents. The L2 is responsible for sharding and replicating data across storage groups
A fault-domain set of vmselects. These vmselects are responsible for reading and deduplicating data from fault-domain sets of storage groups.

Currently, the information about where samples go is fully encapsulated in the configuration of the shared vmagent tier.

According to the scheme above, the routing info still remain incapsulated into L1 layer.

If we moved to dedicated vmagent sets, we would need to size them independently

Indeed, L2 vmagents need to be sized and provisioned independently. As well as vminsert/vmselect/vmstorage services. The capacity planning should be pretty straightforward, as these vmagents are responsible only for replication/sharding process, and they already receive optimally pre-processed data in an optimal way.

as well as configure certain external services that push samples to us to become aware of which vmagent set is for which metrics.

I don't get this. According to the scheme above, the external services will remain aware of L1 vmagents only.

At a high level, this would mean being able to assign remoteWrite.urls to identifiable groups, and then have the relabelling configs be mapped to specific groups.

I'm afraid of making the configuration params overcomplicated. This topology already is pretty complex, making L1 layer do everything could be too complicated and rarely used by anyone in community. The topology with multi-level vmagents seems more transparent to me and provides better flexibility. It also doesn't require overcomplicating the configuration of vmagents.

plangdale-roblox · 2024-05-06T16:33:24Z

@hagen1778 Thanks for your reply. I understand why you are proposing this approach, and it is one I considered internally. But there is a real cost associated with running an additional VM Agent tier - these consume CPU and RAM resources. As we don't need more vm agents to handle load, adding more to handle this distribution work increases our cost to serve. From that perspective, increase vm agent configuration complexity (which I agree will happen) is the preferable option.

hagen1778 · 2024-05-07T12:16:37Z

But there is a real cost associated with running an additional VM Agent tier - these consume CPU and RAM resources.

The L2 vmagent should consume significantly less resources than L1 vmagents, as it doesn't do scraping/relabeling/aggregation. Resource usage should be about 1vCPU core and 500MiB of mem for serving 300K samples/s. If we assume overall ingestion of 100Mil samples/s into the system, the L2 layer will require about 330 CPUs and about 160GiB of RAM. With some extra provisioning it would be 400 CPUs, 200GiB of RAM. Or 40 CPUs, 20GiB of RAM for 10Mil samples/s, which is still multiple times above the average for most of the VM setups.

For your case, how significant would the resource cost for adding the L2 based on your calculations?

plangdale-roblox · 2024-05-07T22:15:22Z

For the clusters that have this architecture (we call them "sharded clusters" but they are a set of storage/insert/select nodes with a shared agent layer), our aggregate throughput is about 50M samples in, 100M samples out. In practice, there are six of these sharded clusters so we would need to run a separate L2 for each of them, and we would want some amount of physical isolation - so keep in mind that we would not be able to deploy the L2 as densely as theoretically possible. When you combine that with the operational overhead of having an additional moving parts that we have to monitor and deal with failures for, it is not something we would want to casually do. Keep in mind that the one cluster per fault domain design will itself already increase the operational overhead because of the more complex deployment topology.

Thanks.

hagen1778 · 2024-05-11T13:51:01Z

@plangdale-roblox Do you think something like below will work for your case?

./bin/vmagent \
  -remotewrite.url=gr1/vminsert-1-1 \
  -remotewrite.url=gr1/vminsert-1-2 \
  -remotewrite.url=gr1/vminsert-1-3 \
  -remotewrite.urlRelabelConfig=gr1/cfg1 \
  -remotewrite.urlShardByURL=gr1/true \
  -remotewrite.urlShardByURLReplicas=gr1/2 \
  
  -remotewrite.url=gr2/vminsert-2-1 \
  -remotewrite.url=gr2/vminsert-2-2 \
  -remotewrite.url=gr2/vminsert-2-3 \
  -remotewrite.urlRelabelConfig=gr2/cfg2 \
  -remotewrite.urlShardByURL=gr2/true \
  -remotewrite.urlShardByURLReplicas=gr2/2

Also, do you plan to use stream aggregation per each group?

plangdale-roblox · 2024-05-14T16:58:50Z

Thanks @hagen1778. What you've described here should work. Let me describe how we use relabel configs today to make sure there's nothing there that could be problematic in this scheme.

Here is the pattern we use for the agent command line (note the real cluster here has 14 "shards"):

-remoteWrite.relabelConfig=/local/relabel_config.yml
-remoteWrite.url=http://shard1/insert/0/prometheus/api/v1/write 
-remoteWrite.url=http://shard-2/insert/0/prometheus/api/v1/write 
-remoteWrite.url=http://shard-3/insert/0/prometheus/api/v1/write 
-remoteWrite.url=http://shard-4/insert/0/prometheus/api/v1/write
-remoteWrite.urlRelabelConfig=/local/1_url_relabel.yml,/local/2_url_relabel.yml,/local/3_url_relabel.yml,/local/4_url_relabel.yml

The first global relabel config looks at tags which identify the sample source and then adds a tag which identifies which shard the sample should go to. We ensure the rules here tag every sample (so there's a catch-all rule for any samples that don't match any other rule). Then the second set of relabel config files each correspond to one of the remote write URLs and simply say keep samples tagged with the matching shard id.

So, with the syntax you have proposed:

the contents of the relabel config files does not change
each remotewrite.url is replaced by a set of urls corresponding to a group
each of the per-group urlRelabelConfig file locations is prefixed with the group name (eg: gr1) as you described, so that it is applied for the whole group

If it works that way, then we are good.

As a minor point, I think the gr1/ prefix syntax might not work in that literal form due to ambiguity with file paths, but obviously a different prefix pattern can be used.

Thanks!

hagen1778 · 2024-05-15T10:14:09Z

Thanks! What about stream aggregation? Are you going to use it per-group?

plangdale-roblox · 2024-05-17T17:30:34Z

Currently, streaming aggregation is handled by a second set of agents, so part of the relabel config sends aggregation input samples to these dedicated agents. Those agents then write their output to a single "shard" (ie: a single group). So everything we've discussed so far should work just fine for those, and if we ever did find ourselves writing aggregation results to multiple groups, it should still work fine unless I'm missing something.

plangdale-roblox added the enhancement New feature or request label Apr 30, 2024

valyala added the vmagent label May 6, 2024

hagen1778 assigned AndrewChubatiuk May 22, 2024

plangdale-roblox mentioned this issue May 23, 2024

Add fault domain awareness to storage #6054

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vmagent: support multiple groups of remoteWrite.urls to support directing samples to different groups #6212

vmagent: support multiple groups of remoteWrite.urls to support directing samples to different groups #6212

plangdale-roblox commented Apr 30, 2024

hagen1778 commented May 6, 2024

plangdale-roblox commented May 6, 2024

hagen1778 commented May 7, 2024 •

edited

plangdale-roblox commented May 7, 2024

hagen1778 commented May 11, 2024 •

edited

plangdale-roblox commented May 14, 2024

hagen1778 commented May 15, 2024 •

edited

plangdale-roblox commented May 17, 2024

vmagent: support multiple groups of remoteWrite.urls to support directing samples to different groups #6212

vmagent: support multiple groups of remoteWrite.urls to support directing samples to different groups #6212

Comments

plangdale-roblox commented Apr 30, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional information

hagen1778 commented May 6, 2024

plangdale-roblox commented May 6, 2024

hagen1778 commented May 7, 2024 • edited

plangdale-roblox commented May 7, 2024

hagen1778 commented May 11, 2024 • edited

plangdale-roblox commented May 14, 2024

hagen1778 commented May 15, 2024 • edited

plangdale-roblox commented May 17, 2024

hagen1778 commented May 7, 2024 •

edited

hagen1778 commented May 11, 2024 •

edited

hagen1778 commented May 15, 2024 •

edited