Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_html_live() memory "leak" #408

Open
rdelrossi opened this issue Apr 6, 2024 · 4 comments
Open

read_html_live() memory "leak" #408

rdelrossi opened this issue Apr 6, 2024 · 4 comments

Comments

@rdelrossi
Copy link

rdelrossi commented Apr 6, 2024

I'm experimenting with read_html_live() using Vivaldi as the headless browser (I try to avoid installing Google Chrome).

It works great. But, I've noticed that it's leaving some Chromate cruft behind. Each time I run my R script, the macOS activity monitor shows a new instance of the "Vivaldi Helper (Renderer)" that costs about 500 MB of memory and actually grows from there. Run the script too many times, and, naturally the whole computer grinds to a halt. When I force-kill the processes in the activity monitor, the R console report "[error] handle_read_frame error: websocketpp.transport:7 (End of File)"

I'm not sure if I'm missing some kind of clean-up step, or if this is a Vivaldi problem, of if this is an rvest bug, but wanted to let you know, @hadley.

Later:
I've noticed that pairing payload <- rvest::read_html_live(url) with payload$session$close() addresses the problem I'm describing (i.e., the "Vivaldi Helper (Renderer)" disappears from he Activity Monitor). Apologies if I missed the need for doing this in the docs.

@hadley
Copy link
Member

hadley commented Apr 8, 2024

If you rm(payload), then the garbage collector should close down the process a bit later.

@rdelrossi
Copy link
Author

I'll do that, thanks.

@rcepka
Copy link

rcepka commented May 16, 2024

Just want to add my own experience.
I am on Windows 10 and using read_html_live() caused gradual memory consumption ultimately until computer crash. With each new page loaded with read_html_live() I could watch the new Chrome task within the RStudio group in the Task Manager app and memory usage raising up to the level of consumption of all computer memory. Deleting process did not help, I used both methods mentioned in posts above:
`rm(page)

page$session$close()
`
Finally I ended using the Selenider package to do the job. I found this to be an excellent tool. It is even automatically closing the session after returning from the function I created to load the web page and extract the data from it.

@hadley
Copy link
Member

hadley commented May 17, 2024

@rcepka can you provide any more details?

@hadley hadley reopened this May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants