New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overload Manager - Max Global Downstream Connections #6309
Comments
cc @projectcontour/maintainers |
An issue with this is Gateway API Gateway reconciliation, Listeners can be dynamically changed without a contour restart/recreate so they do need to be generated similarly to other resources |
ah i guess this is just concerning the admin/operator facing Listeners so maybe disregard this point |
To summarize some of the context from here and what we have available today:
It makes sense to me to leave the stats endpoints to ignore the global limit, we still want to be able to serve readiness checks and stats even when envoy is overloaded. the admin Listener (listens on a Unix domain socket) should also be made to ignore the global limit as well. I think in general the linked PR looks like it should work just fine, only thing is on the readiness/liveness check note from above:
at the moment the base install YAMLs don't include a liveness check, only a readiness check (see: contour/examples/contour/03-envoy.yaml Lines 76 to 79 in a485abb
are you all adding your own liveness check? if so it would be interesting to see how that is configured and how that has worked out given we only have a readiness check at the moment, if i'm understanding correctly we could just set that existing Listener to respect the global connection limit so that instances that are overloaded are taken out of the ready Service endpoints but not shut down the admin/stats Listeners should always ignore the global limit and seems valid to have the readiness probe optionally respect the global connection Limit (or always respect it, given this will only matter if you actuall set a limit) |
Part of the edge best practices for running envoy, in addition to the max-heap resource monitors, there's a global connection limit set,
https://www.envoyproxy.io/docs/envoy/latest/configuration/best_practices/edge#best-practices-edge
This is particularly helpful for one of our cases where the connections came in so rapidly that the heap resource monitor based actions couldn't kick in and envoy ended up in a death loop.
Adding the limit itself is not particularly invasive, however, an extension of that our requirements was to make it possible to fail readiness checks, but keep passing liveness checks which leads to some extra listener configuration requirements. The way I've addressed this is by using the
ignore_global_conn_limit
on the configured stats admin listeners. By default all listeners ignore the connection limit, but we add an optional configuration to setup a listener that has the flag set tofalse
.The Draft PR I've opened (#6308) does a minimal configuration update, but it it makes for a very cumbersome interactions between the serve-context config, the contour config CRD, and how it's passed to the underlying internal envoy package. Before I make any other changes (and add more tests, etc.) I'd like to get some feedback on a better path forward re: configuration.
My proposal would be to shift the configuration of the listeners up to the serve command, which would be responsible for constructing the listener configurations from it's parameters, effectively moving the core branch logic out of
v3.StatsListeners
tomain.doServe
. I see an added benefit of reducing how deep the contour apis get passed into the internal components and it keeps the configuration isolated to the serve command instead of having it distributed to the inner parts. I suspect this may simplify some of the test configuration as well so we don't have tor rely on assumptions about the listeners returned (eg, taking the first listener https://github.com/projectcontour/contour/blob/main/internal/featuretests/v3/envoy.go#L597 and taking that for granted in the discovery responses)I wrote a proof of concept change to illustrate what that might look like, which if reasonable will implement in my draft (#6308)
https://github.com/projectcontour/contour/compare/main...seth-epps:options-listener-config?expand=1
The text was updated successfully, but these errors were encountered: