New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The default value of availableMemoryRatio
is too low
#2423
Comments
This seems to be caused by the following snippet (line 178): crawlee/packages/core/src/autoscaling/snapshotter.ts Lines 176 to 180 in 6f2e6b0
The In such cases, this can be remedied by overriding the defaults with the new PlaywrightCrawler(
{},
new Configuration({
availableMemoryRatio: 1,
})
); |
availableMemoryRatio
is too low
Humph, it'd make sense to me if |
Perhaps we can set the default value of We might use the |
I'd say lets set it at the base image level, with cheerio and normal node having a higher ratio than browser images, but what do you think would be better? |
I'm probably missing important info here - if I start a new crawlee project, I get a Dockerfile based on one of the base images, correct? If I change the crawler type in my code (perfectly legit thing IMO), won't configuration done in the base image just stick? That seems hard to track down... |
this is true, but you should also update the image in that case... I guess this is a rough thing to fix... Maybe we can middleground? Expose an env variable from base images that specify the img type and actor.init decides on default ratio based on it? Or maybe I'm just high and there's a better solution! I'm just throwing ideas here :D |
I realized I haven't commented on this anywhere, we only discussed this with Jindra on Thursday - so here is the thing: we already set this value to 1 on the platform, and it worked just fine until recently. It's done in the SDK in https://github.com/apify/apify-sdk-js/blob/master/packages/apify/src/actor.ts#L203 What I think might have happened is that a wrong config is resolved via AsyncLocalStorage (as by default all places use the global config which resolves to a scoped one via ALS). If that's the case, it could be caused by #2371. |
Could you elaborate how? That ALS is not even in place when you're not working with AdaptivePlaywrightCrawler. Or is this just a hunch that two supposedly independent instances of AsyncLocalStorage may interfere in weird ways? |
Yes, it's a hunch, based on years of experience working with ALS, seeing all the weird edge cases myself (been using it before it became stable). What I am sure about:
It could be as well about some other refactoring, but that particular PR sounds like the ideal first candidate to check. I haven't tried to reproduce this yet, not sure if it's surfacing always or if it's just a fluke? If it's happening the same all the time, I would first try to revert that PR via Next time let's please at least add a link to slack discussions to the OP for more context. |
I will close this since it's no longer surfacing in the current version and I haven't been able to confirm my hunch from above either (also the PR in question looks safe on a second look, it shouldn't affect more than just the adaptive crawler even if it would be the culprit). |
In Apify Actor with 4 GB of available memory, the
AutoscaledPool
refuses to scale up, as it only sees ~1 GB of free memory.This slows down scrapers and might cause higher costs for Crawlee-based Actor users (user is billed per available memory/second, not used memory).
The text was updated successfully, but these errors were encountered: