Skip to content
This repository has been archived by the owner on Jan 30, 2019. It is now read-only.

reubano/cookiecutter-collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cookiecutter-collector

A Python data collector Cookiecutter template. The scraper is designed to work in a ScraperWiki "box", however it can be deployed virtually in any Unix environment. For detailed documentation about how to create and manage scrapers on ScraperWiki please refer to its official documentation.

Usage

Generate a new collector:

cookiecutter https://github.com/reubano/cookiecutter-collector.git

Then:

Collector Structure

The default way to use ScrapeWiki is to store data in a SQLite database named scraperwiki.sqlite in the user's root directory. This enables a series of features such as an interactive SQL querier, an html table view with filters, API endpoints for making remote SQL queries, etc.

The folder structure is as follows:

collector-skeleton
    +---LICENSE
    +---Makefile
    +---README.md
    +---app
    |   +---__init__.py
    |   +---models.py
    |   +---utils.py
    +---bin
    |   +---check-stage
    |   +---upload
    |   +---setup
    +---config.py
    +---dev-requirements.txt
    +---http
    |   +---index.html
    +---manage.py
    +---requirements.txt
    +---setup.cfg
    +---setup.py
    +---tests
        +---__init__.py
        +---standard.rc
        +---test.sh
  • manage.py contains the main script commands.
  • config.py contains the configuration settings.
  • http generally contains an index.html file with the summary of the scraping task and any other files that are intended to be available through an API endpoint, such as a log.txt file.
  • app contains the collector model and initialization.

Looking for collector examples?

Want to contribute?

I will glady accept pull requests if they improve the collector development experience.

About

A Python data collector Cookiecutter

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published