Skip to content

A simple Scrapy script for crawling Reuters news articles (Python 3)

Notifications You must be signed in to change notification settings

zaemyung/crawl-reuters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Crawl-Reuters

A simple Scrapy script for crawling Reuters news articles (Python 3)

Usage

  1. Install Scrapy: pip install Scrapy
  2. Modify the code in ./crawler/crawler/spiders/reuters_spider.py to suit your needs
  3. Run the script: scrapy crawl reuters

For more detailed information on running Scrapy scripts, visit: Scrapy Tutorial

Output

The crawled articles for each day are saved as a JSON file at ./crawler/crawled/*year*/*month*/*date*.json

JSON Format

{
    "text": ["This is the first sentence of the article.", "The second sentence is here"],
    "section": "Politics",
    "title": "Reuters News Articles Crawled",
    "date": "20161113"
}

About

A simple Scrapy script for crawling Reuters news articles (Python 3)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages