Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnews with user agent returns empty text #976

Open
wj210 opened this issue Oct 18, 2023 · 1 comment
Open

gnews with user agent returns empty text #976

wj210 opened this issue Oct 18, 2023 · 1 comment

Comments

@wj210
Copy link

wj210 commented Oct 18, 2023

I encountered some issue with scraping with gnews, these errors are along the lines of
Article download() failed with 403 Client Error: Max restarts limit reached for url
Article download() failed with 403 Client Error: Forbidden for url

So i followed https://github.com/johnbumgarner/newspaper3_usage_overview and implemented the user headers, but as soon as i do it, the article.text returns an empty str.

The links are google RSS articles. example "https://news.google.com/rss/articles/CBMifWh0dHBzOi8vc2Vla2luZ2FscGhhLmNvbS9hcnRpY2xlLzE4NDM5MzItdGhlLWV4cGxhbmF0aW9uLWJlaGluZC1hcHBsZXMtZ3Jvc3MtbWFyZ2luLWRlY2xpbmUtYW5kLXdoeS10aGUtZnV0dXJlLWxvb2tzLWJyaWdodGVy0gEA?oc=5&hl=en-SG&gl=SG&ceid=SG:en"

whereas the underlying link "https://seekingalpha.com/article/1843932-the-explanation-behind-apples-gross-margin-decline-and-why-the-future-looks-brighter" works fine.

@johnbumgarner
Copy link

Thanks for mentioning my usage document in this Issue. What sites give you a 403?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants