Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: ability to monitor connection draining #2225

Open
j3parker opened this issue Aug 22, 2023 · 1 comment
Open

Feature request: ability to monitor connection draining #2225

j3parker opened this issue Aug 22, 2023 · 1 comment
Labels
Type: Idea This issue is a high-level idea for discussion.
Milestone

Comments

@j3parker
Copy link

j3parker commented Aug 22, 2023

(Follow-up from this PR #2223)

What should we add or change to make your life better?

When destinations (and entire clusters) are removed from YARP config they will still be used for some period of time: we may be waiting on responses for previously-sent requests, we may even have requests that are about to be sent to a destination but haven't made it out the door yet (e.g. maybe we just picked the outgoing destination in load balancing.)

It would be helpful to be able to monitor this.

We are investigating polling WeakReferences to outgoing DestinationState/ClusterState objects in a background job as a way to implement this on our side. This may work but feels a little fragile/risky.

If we were able to subscribe to events (e.g. if there was an interface for this in Yarp.Telemetry.Consumption) rather than poll that would be convenient.

Why is this important to you?

At the cluster level, we use clusters to roll out new versions of our software. A new cluster is created with new destinations, and routes are switched to this new cluster. We would like to know if/when a cluster becomes unused so that our deployment tooling can pause before tearing down old resources.

The same thing happens at the destination level but for a different reason typically: our service(s) scale in and out based on load, and when we scale in we need to chose a server to remove from YARP, wait for connections to finish, and then terminate the VM.

In both of these scenarios we'd have an upper bound for requests to finish, this monitoring would enable us to react faster in the (common) happy case where things don't hit the upper bound. That upper bound could be high if you occasionally serve large downloads.

@karelz
Copy link
Member

karelz commented Aug 29, 2023

Triage: We can see the value in this, however, it is rather advanced scenario - pretty rare (first request ever for it). Also, the design+implementation is rather involved.
As such, we will move it to Backlog and wait for more upvotes over time. If there is very low number of upvotes, we might choose to close it eventually as Won't Fix.

As a workaround we recommend to monitor the backend for no activity and then shutting it down there without YARP intervention.

@karelz karelz added this to the Backlog milestone Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Idea This issue is a high-level idea for discussion.
Projects
None yet
Development

No branches or pull requests

2 participants