Round Robin Load Balancing not working as expected #2151

brendanalexdr · 2023-05-31T14:50:24Z

Background

I am attempting to deploy 2 replicas of a simple MyTestWebApp in a Docker Swarm envinroment using the Round Robin config. The purpose is to gain experience before deploying in production and testing the Round Robin config. (Each deployment of MyTestWepApp generates a unique app ID and when a request hits the controller it is logged in the console)

Expected Behavior

For each request to the end point, my YARP implementation will hit one instatiation of MyTestWebApp, then the second instantiation, then back to the first, and so on, in a Round Robin fashion.

The (Possible) Bug

For each request to the end point, my YARP implementation hits one instatiation only of MyTestWebApp, but no requests hit the second instantiation. If I pause making requests for a period of time (maybe 5 minutes or so), the second instantiation may be hit but then the first will not be hit.

My Config

"ReverseProxy": {
"Routes": {
  "route1": {
    "ClusterId": "mytestwebapp",
    "Match": {
      "Path": "{**catch-all}",
      "Hosts": [ "mytestwebapp.dev" ]
    }
  }

},
"Clusters": {
  "mytestwebapp": {
    "LoadBalancingPolicy": "RoundRobin",
    "Destinations": {
      "destination1": {
        "Address": "http://mytestwebapp:5023/"
      }
    }
  }
}

Here is my docker compose file:

version: '3.8'

services:
  tempwebapp:
    image: localstore/tempwebapp:1.3
    environment:
      - ASPNETCORE_URLS=http://*:5023
      - ASPNETCORE_ENVIRONMENT=Production
    ports:
      - 5023:5023
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
      placement:
        constraints: [node.role == manager] 
    networks:
      - localnet
  yarpreverseproxydev:
    image: localstore/yarpreverseproxydev:1.0
    ports:
      - 80:80
      - 443:443
    environment:
      - ASPNETCORE_ENVIRONMENT=Production
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints: [node.role == manager]  
    networks:
      - localnet
networks:
  localnet:
    driver: overlay
    attachable: true
    name: localnet

Console Logs from each instantiation

From tempwebapp in Containter 1:

From tempwebapp in Containter 2:

Tratcher · 2023-05-31T14:58:03Z

RoundRobin operates on Destinations, and you've only supplied one. It sounds like another component is doing DNS or TCP load balancing underneath?

    "LoadBalancingPolicy": "RoundRobin",
    "Destinations": {
      "destination1": {
        "Address": "http://mytestwebapp:5023/"
      }

brendanalexdr · 2023-05-31T15:01:55Z

RoundRobin operates on Destinations, and you've only supplied one. It sounds like another component is doing DNS or TCP load balancing underneath?
    "LoadBalancingPolicy": "RoundRobin",
    "Destinations": {
      "destination1": {
        "Address": "http://mytestwebapp:5023/"
      }

Ok this is precisely why I was testing. But...in a typical clustered environment, across many nodes, and changing deployment replica counts, how do you configure destinations? So YARP cant do load balancing in a dynamic clustered environment?

FYI, DNS is being handled by windows 11 on my dev box. Got no underlying load balancing going on under the hood. Was thinking YARP would handle this.

samsp-msft · 2023-05-31T18:23:10Z

Ok this is precisely why I was testing. But...in a typical clustered environment, across many nodes, and changing deployment replica counts, how do you configure destinations? So YARP can't do load balancing in a dynamic clustered environment?

You need a mechanism to resolve the destinations by talking to whatever is doing the dynamic clustering - such as kubernetes, which there is a YARP ingress controller for k8s. One of the reasons we have the extensibility in YARP is to enable customers to write configuration management that will pull the data from their backend systems.

brendanalexdr · 2023-05-31T19:49:23Z

Ok I go it. So, basically, if I understand correctly, in the case of my docker swarm test environment, I would need to use something like HaProxy to mediate the round robin with the microscervices.

samsp-msft · 2023-06-06T17:14:45Z

Docker Swarm is similar to kubernetes in that it manages where the service instances live, and how to route to them. You can either use its build in routing or configure it to export that data via dns.

The part that is missing from YARP is having a DNS provider that will resolve a dns name to its addresses, and regularly poll the dns to check the addresses. YARP's config is a little confusing in that you can specify a destination via a hostname, but we expect that to resolve to a single host.

We need to have a dns provider similar to HAProxy, where you can configure the DNS and names to be resolved. YARP would then actively ping the DNS to update the host list. AFAIK there is not a notification system for DNS, so you need to poll, which means that it will always be a little out of date, depending on how often instances are created and destroyed.

samsp-msft · 2023-06-20T17:25:32Z

Keep this open in case #2154 doesn't resolve all the issues

MayTakeUpTo8Hours · 2023-06-22T13:07:48Z

Similar issue here.

Background

Services running on a kubernetes cluster (AKS)
k8s deployment of app has 2 replicas (= 2 pods)
k8s service already hides the 2 instances (pods) behind one interface
k8s service already does balancing per round-robin

Sketch of the environment:

Expected Behavior:

YARP forwards traffic to k8s service
k8s service takes care of the load-balancing (does round-robin)

Actual Behavior:

Requests only hitting on 1 pod
After some time of inactivity (something like 5-10 mins) it switches around to only hit the other pod
No load-balancing given as no matter the load, only one instance is receiving the requests

When skipping YARP by adding an nginx ingress to go directly onto the service, it works just as expected!
Due to architectural reasons, this sadly does not suffice as a workaround in my case.

The (Possible) Bug

Maybe some sort of keep-alive that is added by YARP and makes the k8s service forward the request to the same pod all the time?
Sadly I'm currently not able to properly capture traffic between YARP and the k8s service.

Tratcher · 2023-06-22T17:54:53Z

The k8s Service load balancing is TCP connection based, not HTTP request based, right?

YARP will reuse connections as much as possible, so you'll only get new connections when there is high concurrency. Once there are multiple connections, I assume it still prefers the first one when it's available. This can't really be fixed without moving the load balancing to YARP. The other way is to disable connection re-use but that would cause a number of issues.

bford1988 · 2024-02-26T16:34:29Z

Hey @Tratcher, we're running into this exact issue using YARP as our API Gateway with destinations pointed to k8s services.

We're about to test disabling the connection re-use in YARP, and I was hoping you could expand on what types of issues we may encounter. Thanks for your attention to this issue.

Tratcher · 2024-02-26T17:38:45Z

Disabling connection reuse will cause higher latency, resource usage, and potentially port exhaustion when under heavy load.

bford1988 · 2024-02-26T18:16:02Z

Thanks a lot for the response @Tratcher, much appreciated. We will try to test with the connection re-use disabled, but as you previously said, this does not seem like a viable option for a prod environment under heavy load.

If we're unable to access the k8s pods directly from YARP to make use of YARP's load balancing, it appears we may be out of options to resolve this k8s service load balancing issue.

Do you know if there are ongoing plans/efforts to release the Yarp.Kubernetes.Controller project or has this been abandoned? https://github.com/microsoft/reverse-proxy/blob/main/docs/docfx/articles/kubernetes-ingress.md

Thanks again!

Tratcher · 2024-02-26T18:17:42Z

That's a question for @MihaZupan.

MihaZupan · 2024-02-27T09:47:00Z

Have you tried using the new destination resolvers feature?

services.AddReverseProxy()
+   .AddDnsDestinationResolver() // You may have to lower the frequency - default is 5 min

This would expand the list of destinations YARP sees from the hostnames (service names) to all the addresses returned by DNS. If that returned multiple available pods, YARP's round-robin load balancing should rotate between those.

bford1988 · 2024-02-27T20:56:32Z

@MihaZupan Thanks! That's getting us very close. I've added the DestinationResolver, and we also had to add a k8s headless service instead of using our "normal" service as a destination. Now the pod IPs are discoverable and getting set as destinations (as seen from logs added to the DnsDestinationResolver).

It looks like the last hurdle is that the requests are being routed to "PodIpAddress:443" instead of "PodIpAddress:5001". I'm working on resolving this if you have any advice, and then I think we'll have a complete solution. Thank you for the help!

Update: We'll most likely move forward with simply updating the port to 5001 for the pod IP discovered from the k8s headless service hosts. More testing needed, but so far this solution is working.

bford1988 · 2024-03-05T17:04:51Z

@MihaZupan @Tratcher Thanks for the help. Using the DnsDestinationResolver and k8s headless services as our destinations, we're successfully able to discover and load balance traffic to our k8s pods.

However, when the gateway is under load and a pod is restarting, we receive many 502/504 errors. In testing, we sent 2 requests per 100ms and received roughly 20-30 502/504 errors during a rolling pod restart. The system will be under a much heavier load in production.

We've tried configuring the health checks using a combination of passive and active checks, and we also tried a "FirstFail" health check policy (as outlined in the documentation). No matter how tight we make these policies, it seems it won't be possible to handle pods cycling as well as our original k8s service without YARP (which only receives one or two 504 errors during a pod cycle).

Do you have any recommendations on how to resolve or improve the problem we're seeing? Also, are there any plans to continue working on or release the Yarp.Kubernetes.Controller package ? Thanks for the help, much appreciated.

brendanalexdr added the Type: Bug Something isn't working label May 31, 2023

samsp-msft mentioned this issue Jun 6, 2023

Config providers such as a DNS based destination resolver are unnecessarily duplicative to implement #2154

Open

samsp-msft added this to the Backlog milestone Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Round Robin Load Balancing not working as expected #2151

Round Robin Load Balancing not working as expected #2151

brendanalexdr commented May 31, 2023 •

edited

Tratcher commented May 31, 2023

brendanalexdr commented May 31, 2023 •

edited

samsp-msft commented May 31, 2023

brendanalexdr commented May 31, 2023

samsp-msft commented Jun 6, 2023 •

edited

samsp-msft commented Jun 20, 2023

MayTakeUpTo8Hours commented Jun 22, 2023

Tratcher commented Jun 22, 2023

bford1988 commented Feb 26, 2024

Tratcher commented Feb 26, 2024

bford1988 commented Feb 26, 2024

Tratcher commented Feb 26, 2024

MihaZupan commented Feb 27, 2024

bford1988 commented Feb 27, 2024 •

edited

bford1988 commented Mar 5, 2024

Round Robin Load Balancing not working as expected #2151

Round Robin Load Balancing not working as expected #2151

Comments

brendanalexdr commented May 31, 2023 • edited

Background

Expected Behavior

The (Possible) Bug

My Config

Console Logs from each instantiation

Tratcher commented May 31, 2023

brendanalexdr commented May 31, 2023 • edited

samsp-msft commented May 31, 2023

brendanalexdr commented May 31, 2023

samsp-msft commented Jun 6, 2023 • edited

samsp-msft commented Jun 20, 2023

MayTakeUpTo8Hours commented Jun 22, 2023

Background

Expected Behavior:

Actual Behavior:

The (Possible) Bug

Tratcher commented Jun 22, 2023

bford1988 commented Feb 26, 2024

Tratcher commented Feb 26, 2024

bford1988 commented Feb 26, 2024

Tratcher commented Feb 26, 2024

MihaZupan commented Feb 27, 2024

bford1988 commented Feb 27, 2024 • edited

bford1988 commented Mar 5, 2024

brendanalexdr commented May 31, 2023 •

edited

brendanalexdr commented May 31, 2023 •

edited

samsp-msft commented Jun 6, 2023 •

edited

bford1988 commented Feb 27, 2024 •

edited