-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Pass total number of active replicas as a context variable #2889
Comments
Hi @dberardo-com what you are trying to find here is a "locking" mechanism. |
maybe this can be seen as a lock, but the functions themselves are still stateless. Acutally i still see this as a deduplication problem, similar to what one would have with kafka streams and others. Basically i am trying to introduce an home-made version of kafka's "consumer groups" if you will. one use case i am thinking of are Jobs, basically one would have a queue of jobs and replicas would only pick jobs whose id is a muliple of their sequence number. The mechanism is quite robust if the number of replicas is fixed and does not change.
is this part of the context object? or could the replica access this value? and is there also a min replicas equivalent? --> this could be used as a workaround to get the total number of replicas As for the serial number: i know that there is a "worker_id" somewhere in nuclio but i am not sure how that works and whether one could use it as serial_number |
worker id is unique per id and thus could be use a serial number. usage: For our internal stream implementation (v3io) we use a state file which holds all replicas holder ids and using locking mechanism atop of it where each holder try to acquire a specific "shard". if rebalance or fail occur, we perform "resharding" again. This works good, but the pain points are mostly handling errors and recover from errors (no data loss, etc) |
ok, but is this number between min_replicas and max_replicas ? or is it at least a sequential number? i mean, if i have 2 different function (A and B) each with 3 replicas, are the worker_ids of A: 1,2,3 and B: 4,5,6 or are could they be mixed up as in: A: 4, 19, 22, B: 33, 27, 35 ?
I assume that a service account will be needed in the Pods to use this feature right? or does nuclio already inject a service account token inside every pod ?
sure enough i could use a shared volume mount, but i wanted to keep it simpler and stateless as much as possible |
It has nothing with replicas but trigger workers only.
you will need to create an SA and provide it. you can pass it via UI or function config. the pod will use that from the function config.
keep in mind that you will need to provide a locking abstraction on top of writing/reading from that file |
yeah, that might be hard to solve without a "serial number" maybe
mmm i see ... so there is no way to extract a sequence number from the context then. and even if this was possible, then this number will most likely not be updated if new replicas are spawn due to HPA intervention (btw, any news on the autoscaling topic? - #2767) i can have a more in-depth look at the context object, but from your answer i understand that currently there is no "internal" way to achieve such kind of deduplication / load balancing without the help of an external lock, and also there is no interest into implementing anything like that in nuclio because it is considered an anti-pattern |
Feature Type
Adding new functionality to Nuclio
Changing existing functionality in Nuclio
Removing existing functionality in Nuclio
Problem Description
In order to do load balancing and deduplication it might be useful for each replica of the same function to know:
Feature Description
let's assume that replica A and B both need to register to the same MQTT topic and cannot use shared subscriptions (because they need to react differenly to the same messages), then in order to prevent that A reacts to all messages, like B would, it could be useful that A picks up only the "odd messages" and B the "even" ones.
the same principle would apply of course for 3,4, 5... replicas, using the
<sequence_id> % <total_num_replicas>
logic to deduplicate.the total_num_replicas of course needs to change if the number of replicas is scaled up and down, to allow rebalancing of subscriptions inside the existing replicas
Alternative Solutions
i am not aware of any
Additional Context
No response
The text was updated successfully, but these errors were encountered: