Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thank you Georgios! #52

Open
lexciobotariu opened this issue May 13, 2024 · 4 comments
Open

Thank you Georgios! #52

lexciobotariu opened this issue May 13, 2024 · 4 comments

Comments

@lexciobotariu
Copy link

I've been running the script with 5k queries for the last 10h and it got to the level where it is using over 200GB RAM and I've set it to use 35 cores.

It scraped over 300k businesses.
image

I'm just a bit worried that it won't finish the entire list of queries before crashing due to the lack of RAM.
Any suggestions on how to continue the scraping once it crashes, and to retake from where it left?

@admbyz
Copy link

admbyz commented May 18, 2024

try less cores or you can split your keywords and run synchronously i have shared a script on this closed issue #35 and make sure you are running latest version.

@gosom
Copy link
Owner

gosom commented May 20, 2024

@lexciobotariu what is the outcome of this? Have you managed to scrape all your keywords?

@lexciobotariu
Copy link
Author

Hello there, it did manage to scrap all the information ~500k.
@admbyz I did use your suggestion in the past, was working perfectly.

@admbyz
Copy link

admbyz commented May 27, 2024

eh i misunderstood your problem. but your request seems really hard because i dont think google sends static results with requests you do. So able to resume program also need to validate returned data from google. Skipping already scraped data is more performant sure but at the end total request will be same unless only checking exact url and skip the entire results.
I didnt pay attention to terminal prints but maybe you can extract data from there make a new or remove already passed data from your keyword list and check if scraper not running and keywordlist is not empty then run again. But before that you need to check is scraper append results or removes and append new result to file after restarting. If its not appending you have to make new result file every restart. I am not recommend this way to handle scraping its wonky and not reliable..
Running scraper with less cores will be best bet for you i guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants