Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling passive health check on multiple instances of Yarp #2405

Open
jayendranarumugam opened this issue Feb 16, 2024 · 0 comments
Open

Handling passive health check on multiple instances of Yarp #2405

jayendranarumugam opened this issue Feb 16, 2024 · 0 comments
Assignees
Labels
Type: Idea This issue is a high-level idea for discussion.
Milestone

Comments

@jayendranarumugam
Copy link

I'm following this code, where its been implemented by the Yarp.

public class ThrottlingHealthPolicy : IPassiveHealthCheckPolicy
{
    public static string ThrottlingPolicyName = "ThrottlingPolicy";
    private readonly IDestinationHealthUpdater _healthUpdater;

    public ThrottlingHealthPolicy(IDestinationHealthUpdater healthUpdater)
    {
        _healthUpdater = healthUpdater;
    }

    public string Name => ThrottlingPolicyName;

    public void RequestProxied(HttpContext context, ClusterState cluster, DestinationState destination)
    {
        var headers = context.Response.Headers;

        if (context.Response.StatusCode is 429 or >= 500)
        {
            var retryAfterSeconds = 10;

            if (headers.TryGetValue("Retry-After", out var retryAfterHeader) && retryAfterHeader.Count > 0 && int.TryParse(retryAfterHeader[0], out var retryAfter))
            {
                retryAfterSeconds = retryAfter;
            }
            else
            if (headers.TryGetValue("x-ratelimit-reset-requests", out var ratelimiResetRequests) && ratelimiResetRequests.Count > 0 && int.TryParse(ratelimiResetRequests[0], out var ratelimiResetRequest))
            {
                retryAfterSeconds = ratelimiResetRequest;
            }
            else
            if (headers.TryGetValue("x-ratelimit-reset-tokens", out var ratelimitResetTokens) && ratelimitResetTokens.Count > 0 && int.TryParse(ratelimitResetTokens[0], out var ratelimitResetToken))
            {
                retryAfterSeconds = ratelimitResetToken;
            }

            _healthUpdater.SetPassive(cluster, destination, DestinationHealth.Unhealthy, TimeSpan.FromSeconds(retryAfterSeconds));
        }
    }

One of the limitation, is

This solution uses the local memory to store the endpoints health state. That means each instance will have its own view of the throttling state of each OpenAI endpoint. What might happen during runtime is this:

Instance 1 receives a customer request and gets a 429 error from backend 1. It marks that backend as unavailable for X seconds and then reroute that customer request to next backend
Instance 2 receives a customer request and sends that request again to backend 1 (since its local cached list of backends didn't have the information from instance 1 when it marked as throttled). Backend 1 will respond with error 429 again and instance 2 will also mark it as unavailable and reroutes the request to next backend


Question:

Is there any other option to use for storing this endpoint health state in a centralized zone/place instead of local memory which may not work for multiple instances of Yarp?

@jayendranarumugam jayendranarumugam added the Type: Idea This issue is a high-level idea for discussion. label Feb 16, 2024
@MihaZupan MihaZupan self-assigned this Mar 21, 2024
@MihaZupan MihaZupan added this to the Backlog milestone Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Idea This issue is a high-level idea for discussion.
Projects
None yet
Development

No branches or pull requests

2 participants