Power-tuning involving only cold starts #176

alexcasalboni · 2022-08-08T10:15:51Z

The tool could provide an option to power-tune a given function considering only cold start invocations.

The current logic is based on aliases in order to maximize parallelism and optimize overall speed of the power-tuning process. Unfortunately, it also makes it hard to achieve cold-start-only tuning.

We could implement an alternative logic such as:

When parameter forceColdStarts (or onlyColdStarts) is provided:
[Initializer] Do nothing (no alias or version needs to be created)
[Executor] Invoke $LATEST sequentially after forcing a cold start (by updating power config and an env variable)

This should work fine for all values of num and any power value.

The only drawback is that nothing can be parallelized, which isn't a big issue as long as each invocation is short enough. For example, if the average cold invocation takes 5 seconds, with num=20 and 5 power values, the overall power-tuning process will take about 8 minutes. It's very easy to reach 40+ minutes with 10s invocations and num>50.

The text was updated successfully, but these errors were encountered:

alexcasalboni · 2022-08-08T13:16:45Z

This is somehow related to the (closed) issue #123.

@Parro what do you think of this approach?

Twitter thread with Paul Johnston for reference: https://twitter.com/alex_casalboni/status/1556585120332759040

Parro · 2022-08-09T07:38:59Z

I was thinking of a different approach: we could create a set of different versions of the lambda with the same code in the Initializer step, this way every invocation of a lambda version should spawn a new process with its cold start. This way we could still use parallelization, and even add a new step in the state machine so that we can have the statistics of Duration and Init Duration together in the report.

What do you think about it?

alexcasalboni · 2022-08-09T08:19:12Z

@Parro yes, that's what Paul proposed too.

Let's double-check the Lambda quotas :)

Is there any limit regarding the # of aliases per function? Or any API rate-limiting when creating new versions and aliases? I never encountered any limitations since we only create one version/alias per power value.

Let's assume there are no such limitations.

With x power values and num invocations, we'll need to create x * num versions and aliases, so that we can invoke them in parallel. I often run the tool with 5 power values and num=50 (or even 100), so that means 250+ versions and aliases created during initialization.

I'd agree this mechanism is better for the overall execution time, even if the initialization phase will take much longer. As far as I can remember, the creation of new versions/aliases cannot be parallelized. Initializing 4-5 versions currently takes 7-8 seconds. With num=50 it will take more than 6 minutes.

Parro · 2022-08-09T08:45:16Z

Is there any limit regarding the # of aliases per function?

The only limit I am aware of is the Code storage of 75 GB. In an account with few lambdas it should not be a problem, in an account with dozens of them we could hit the limit... and of course it depends from the size of the lambda under test. We could state clearly in the documentation that it will be used lambdaSize * powerValues * num storage to make the test.

As far as I can remember, the creation of new versions/aliases cannot be parallelized.

It's a lambda limitation? Even if we use a map step in the state machine it would fail?

Anyway, even if the initializing time is long, but we could also state this in the docs to warn the user.

alexcasalboni · 2022-08-09T08:52:27Z

It's a lambda limitation? Even if we use a map step in the state machine it would fail?

Yes, because you're always working on $LATEST when creating new versions and aliases.

I've just implemented a first iteration of this (both initializer and cleanr logic). I'm going to run some tests and share the WIP code in a new PR later today.

alexcasalboni · 2022-08-09T11:24:48Z

@Parro it works :) Check out the PR #177

ryancormack · 2024-01-11T10:59:54Z

Hey @alexcasalboni @Parro I was wondering if there's any movement on this? I noticed a few open PRs that seem to be working, but not much recent activity. Is there anything that I could help with if there are some rough edges that need a hand?

If there's 1 PR that might be the direction this moves in (if indeed it is planned to move forward with this feature), then I could just clone that version and deploy that short term.

Thanks

alexcasalboni · 2024-01-15T16:18:36Z

@ryancormack thank for checking :) yes, we're definitely moving forward to find the ideal solution for this!

Currently, there are two open PRs using different approaches:

Add parameter to power-tune only cold starts #177 (a bit old at this point) is creating num new versions/aliases for each memory configuration, which does work but it's a bit extreme in the amount of overhead it creates and # of API calls - you easily end up creating and destroying hundreds of versions/aliases, and it also creates an upper-bound of approaximately 500 versions/aliases we can create in 15min (reducing the number of configurations*invocations you can test)
Add parameter to power-tune only cold starts #206 is moving the version/alias creation into a state machine loop, therefore removing the above constraint, at the expense of making the state machine more complex and expensive to run (for all use cases, not only cold starts)

That said, I'm quite sure the second PR is closer to the direction we'll choose and I'd recommend you clone that version for the time being. I'm currently working with a few colleagues at AWS to speed up the maintenance of this tool, so I'd expect we'll settle on a final solution in the next 30-60 days 🚀

ryancormack · 2024-01-19T14:45:11Z

Thanks Alex, it's working mostly well. I've created that issue above - I know it was slightly mentioned way up this issue, but I don't know if it's more nuanced and neither PR currently accounts for it.

The tool was super helpful in actually spawning a huge number of cold starts for me, which was really helpful, and I could use Cloudwatch Log Queries to get the 'end user latency' times that I was hoping to get.

alexcasalboni · 2024-04-30T12:27:20Z

Quick update on this: we're continuing our work on #206 - it turns out that approach is also useful to solve a SnapStart-related problem.

Apologies for the delay, we should be able to finalize the current implementation in a matter of weeks.

alexcasalboni self-assigned this Aug 8, 2022

alexcasalboni mentioned this issue Aug 9, 2022

Add parameter to power-tune only cold starts #177

Closed

NeilJed mentioned this issue Mar 20, 2023

Duration being under reported? #197

Closed

mriccia mentioned this issue May 24, 2023

Add parameter to power-tune only cold starts #206

Open

ryancormack mentioned this issue Jan 19, 2024

Log analysis uses Billed Duration rather than Duration for calculating Lambda run time #228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Power-tuning involving only cold starts #176

Power-tuning involving only cold starts #176

alexcasalboni commented Aug 8, 2022 •

edited

alexcasalboni commented Aug 8, 2022

Parro commented Aug 9, 2022

alexcasalboni commented Aug 9, 2022

Parro commented Aug 9, 2022

alexcasalboni commented Aug 9, 2022

alexcasalboni commented Aug 9, 2022

ryancormack commented Jan 11, 2024

alexcasalboni commented Jan 15, 2024

ryancormack commented Jan 19, 2024

alexcasalboni commented Apr 30, 2024

Power-tuning involving only cold starts #176

Power-tuning involving only cold starts #176

Comments

alexcasalboni commented Aug 8, 2022 • edited

alexcasalboni commented Aug 8, 2022

Parro commented Aug 9, 2022

alexcasalboni commented Aug 9, 2022

Parro commented Aug 9, 2022

alexcasalboni commented Aug 9, 2022

alexcasalboni commented Aug 9, 2022

ryancormack commented Jan 11, 2024

alexcasalboni commented Jan 15, 2024

ryancormack commented Jan 19, 2024

alexcasalboni commented Apr 30, 2024

alexcasalboni commented Aug 8, 2022 •

edited