Irrelevant content getting scrapped #538

kushagrasharma-13 · 2024-03-03T14:04:38Z

The web content that is being scrapped from the url provided in the "01-defining-data-science" is extracting irrelevant information like navigation, random articles and refrences and causes errors in getting insights and forming wordcloud

A clear and concise description of what you want to happen.
I would like to form a solution that takes only the necessary and relevant content for further processing

We can use BeautifulSoup instead of HTMLParser and utilize its features to extract only the relevant content

Irrelevant Content:

Relevant Content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Irrelevant content getting scrapped #538

Irrelevant content getting scrapped #538

kushagrasharma-13 commented Mar 3, 2024 •

edited

Irrelevant content getting scrapped #538

Irrelevant content getting scrapped #538

Comments

kushagrasharma-13 commented Mar 3, 2024 • edited

kushagrasharma-13 commented Mar 3, 2024 •

edited