Improve efficiency of concurrent session management #1346

cybairfly · 2021-06-01T15:20:02Z

cybairfly
Jun 1, 2021

We need to improve resilience of current session management to be usable and more compatible with concurrent actor runs and optimize the reads and writes, (especially retires) for this use case before we can come up with a more sophisticated final solution. Currently, performance and efficiency are suffering on both ends of the load spectrum for non-scraping use cases with runs often moving along the trailing edge of the pool during low demand when the pool needs to keep replenishing missing sessions up to maxPoolSize without utilizing the entirety of the remaining pool for sequential runs as well as during high demand when session persistence in regular intervals and overall incompatibility in terms of concurrent access to the pool is causing outdated data to be persistent or important changes to be lost (retired sessions). Even though management of the sessions still works to some extent and manages to collect useful data on non-working proxies over longer periods of time, the process is currently extremely inefficient due to the above.

Here is a proposal for a work-around patch to use before we can solve this in a better way, the goal is to prevent sessions from being persisted in intervals automatically to avoid read/write collisions as much as possible and reduce the collision surface by manual control:

pool is loaded at the start to get its current state
random session is picked and used during the run
session is marked before end of run or before retry
latest state of the pool is loaded again to bring it up to date
latest state of the pool is updated with the new session used
state of the pool is persisted for other concurrent runs to read
runs are collectively cleaning up the retired sessions at runtime
https://gitlab.com/cybaerfly/apify-robot/-/commit/97e74136c0744dfc363e4a1b607920fe50c0882e

Please let me know how we can proceed to improve this solution and make it cleaner, and perhaps native. Maybe let's have an option for session pool to disable having sessions persisted in regular intervals, that would help a lot on its own @AndreyBykov

cybairfly · 2021-06-01T15:20:26Z

cybairfly
Jun 1, 2021
Author

I don't think we can do much better than this right now without having the actors talk to each other and share their view of the pool but this approach should optimize it and minimize the collisions as much as possible by checking current state before write. Though, I am wondering how the pool will behave in the now more rare collision situation when for example the total number of sessions becomes higher than the max pool size etc? Can't think of any other serious implications right now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve efficiency of concurrent session management #1346

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Improve efficiency of concurrent session management #1346

cybairfly Jun 1, 2021

Replies: 1 comment

cybairfly Jun 1, 2021 Author

cybairfly
Jun 1, 2021

cybairfly
Jun 1, 2021
Author