scrapy-webscrapper-for-bbcnews

A scrapy webscraper that can scrape articles and videos of bbc

Running the spiders

To create a scrapy project:

scrapy startproject bbcnews
cd bbcnews
scrapy genspider bbc bbc.com

Start scrapping

scrapy crawl bbc

Save the data in a json file

scrapy crawl bbc -o bbc.json

This will crawl bbc and save the data in a file called bbc.json.

The structure of bbc spider

This spider can mainly scrape the articles and media contents from the home page of bbc. It has two items, a Bbcnewsitem and a Bbcmediaitem.

The extracted data of a Bbcnewsitem is in this form:

{
  "title": "Mars: Pictures reveal 'winter wonderland' on the red planet", 
  "url": "https://www.bbc.com/news/science-environment-46645321", 
  "type": "Science & Environment", 
  "time": "21 December 2018", 
  "related_topics": "Mars"
}

The extracted data of a Bbcmediaitem is in this form:

{
  "url": "/news/stories-46633914", 
  "title": "When a child experiment goes wrong", 
  "type": "Stories"
}

There is also a log file that can be used for debug. Write anything that you want in the log file by importing logging:

import logging

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bbcnews		bbcnews
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bbcnews

bbcnews

README.md

README.md

Repository files navigation

scrapy-webscrapper-for-bbcnews

Running the spiders

To create a scrapy project:

Start scrapping

Save the data in a json file

The structure of bbc spider

About

Releases

Packages

Languages

zhangshyue/scrapy-webscrapper-for-bbcnews

Folders and files

Latest commit

History

bbcnews

bbcnews

README.md

README.md

Repository files navigation

scrapy-webscrapper-for-bbcnews

Running the spiders

To create a scrapy project:

Start scrapping

Save the data in a json file

The structure of bbc spider

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages