-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscaler underprovisions for uneven low latency traffic #15000
Comments
Hi @Peilun-Li, the KPA autoscaler scrapes QP pods every 2 secs and each QP reports its metrics every 1 sec. It is true that the concurrency metric is calculated over the 1 sec on average. Also the autoscaler takes into consideration the proxied requests from the activator by subtracting them from the final value. The autoscaler calculates the desired pod count based on some window (panic or stable) and assigns a bucket per scrape done (for 60 secs of a stable window that means 30 buckets). Then it calculates a window average to decide the metric to be used (there is an option for a weighted one too). Thus, if you don't keep a concurrency level for enough time within each reporting period you will not see the replicas you expect, that is because the existing replicas served the traffic. I suspect one way to deal with the above scenario is to use rps as a metric since it is calculated as a rate over time (independently of how requests arrived within the 1sec reporting period). For example in the above workload you have 110 rps. You could then have:
The above will assign 10 rps per replica (14.29*0.7~=10 or you could change the utilization factor to 100% and set target to 10).
Could you try the above with your use case and see if that helps. Also in any testing done it would be helpful to enable debug level for autoscaler pod and report the logs (they have valuable info for how autoscaler behaves). |
Thanks for the context and idea @skonto , yeah I think an RPS metric would help and we can try that, but meanwhile I feel it comes with two pain points:
Great suggestion on enabling debug logging for autoscaler, will try that :) |
Ask your question here:
Hi community, we have a potentially skewed low latency traffic targeting a CPU-bound knative service. With concurrency-based autoscaling, we are seeing a high p90+ latency. After we manually increase min-scale to an overprovisioned level, the p90+ latency goes back to the normal level. We suspect this might indicate an underprovision of autoscaler, and want to understand the reasons and explore potential solutions.
Hypothetical traffic pattern & example service settings:
Expected behavior: autoscaler scales the service up to 11 (or higher considering the target utilization percentage)
Actual behavior: autoscaler underprovisions the service and higher p90+ latency.
We studied the autoscaler logic for concurrency based metric a bit and here's our understanding (definitely correct us if we are wrong): the way autoscaler tracks concurrency is actually AverageConcurrency). Using the above hypothetical traffic example, for each seconds:
With that (AverageConcurrency=1.55) it looks like autoscaler will try to scale up to 2, even if we have a peak concurrency of 11, i.e., autoscaler underprovisions if from the perspective of peak concurrency (but certainly makes sense for average concurrency)
Questions:
TIA for any insights and help!
The text was updated successfully, but these errors were encountered: