You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned in many issues: #645#363 , newspaper doesn't work on New York times.
And I tested two versions of New York times, one is the English version, the second is the Chinese version (https://cn.nytimes.com).
The Chinese version doesn't have payment wall, so newspaper should be able to extract the full content of it. However in both cases, newspaper only extract like 3 or 4 paragraphs and they are not from the beginning.
Is there any way i can solve this?
Thanks.
My code:
from newspaper import Article, Config as NewspaperConfig
url="https://www.nytimes.com/2019/08/21/business/economy/jobs-growth-revision.html"
conf = NewspaperConfig()
article = Article(url, config=conf, keep_article_html=True)
article.download()
article.parse()
print(article.article_html)
print(article.text)
If it's any help, #885 works with your first URL. With the second URL, the last sentence is missing and with the third URL I think a few more sentences are missing. I can't read the Chinese version to fully determine what sentences are missing here and there but the linked PR captures more than the master branch - hope it helps!
As mentioned in many issues: #645 #363 , newspaper doesn't work on New York times.
And I tested two versions of New York times, one is the English version, the second is the Chinese version (https://cn.nytimes.com).
The Chinese version doesn't have payment wall, so newspaper should be able to extract the full content of it. However in both cases, newspaper only extract like 3 or 4 paragraphs and they are not from the beginning.
Is there any way i can solve this?
Thanks.
My code:
The urls i tested with:
https://www.nytimes.com/2019/08/21/business/economy/jobs-growth-revision.html
https://cn.nytimes.com/china/20190821/china-hong-kong-social-media-soft-power/
https://cn.nytimes.com/morning-brief/20190822/hong-kong-protests-british-consulate-us-sanctions-fentanyl/
The text was updated successfully, but these errors were encountered: