Replies: 1 comment
-
I don't think we can do much better than this right now without having the actors talk to each other and share their view of the pool but this approach should optimize it and minimize the collisions as much as possible by checking current state before write. Though, I am wondering how the pool will behave in the now more rare collision situation when for example the total number of sessions becomes higher than the max pool size etc? Can't think of any other serious implications right now. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We need to improve resilience of current session management to be usable and more compatible with concurrent actor runs and optimize the reads and writes, (especially retires) for this use case before we can come up with a more sophisticated final solution. Currently, performance and efficiency are suffering on both ends of the load spectrum for non-scraping use cases with runs often moving along the trailing edge of the pool during low demand when the pool needs to keep replenishing missing sessions up to
maxPoolSize
without utilizing the entirety of the remaining pool for sequential runs as well as during high demand when session persistence in regular intervals and overall incompatibility in terms of concurrent access to the pool is causing outdated data to be persistent or important changes to be lost (retired sessions). Even though management of the sessions still works to some extent and manages to collect useful data on non-working proxies over longer periods of time, the process is currently extremely inefficient due to the above.Here is a proposal for a work-around patch to use before we can solve this in a better way, the goal is to prevent sessions from being persisted in intervals automatically to avoid read/write collisions as much as possible and reduce the collision surface by manual control:
pool is loaded at the start to get its current state
random session is picked and used during the run
session is marked before end of run or before retry
latest state of the pool is loaded again to bring it up to date
latest state of the pool is updated with the new session used
state of the pool is persisted for other concurrent runs to read
runs are collectively cleaning up the retired sessions at runtime
https://gitlab.com/cybaerfly/apify-robot/-/commit/97e74136c0744dfc363e4a1b607920fe50c0882e
Please let me know how we can proceed to improve this solution and make it cleaner, and perhaps native. Maybe let's have an option for session pool to disable having sessions persisted in regular intervals, that would help a lot on its own @AndreyBykov
Beta Was this translation helpful? Give feedback.
All reactions