Skip to content

Ankush-Chander/github-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn


github-crawler

Friendly github crawler.

Setup

  1. Install requirements
pip install -r requirement.txt
  1. Update source url as per your need in github/github/spiders/github-user.py
def start_requests(self):
		urls = [
			"your search url here"
		]

For CSV (default)

Set folllowing variables in settings.py

ITEM_PIPELINES = {
   'GithubCsvPipeline': 300,
}

For Elasticsearch

Set folllowing variables in settings.py

ELASTICSEARCH_HOST = ''
ELASTICSEARCH_PORT = 9200
ITEM_PIPELINES = {
   'GithubElasticsearchPipeline': 300,
}

Note: This option requires index to be already created in the elasticsearch server

For Google sheet:

  1. Set folllowing variables in settings.py
GOOGLE_SHEET =""
ITEM_PIPELINES = {
   'github.pipeline.GithubExcelPipeline': 300,
}
  1. Store googleapi credentials in utility/gsheets_credentials.json

Note: This option requires an existing google sheet with permissions "Editable by anyone who has link"

Run instructions

cd github
scrapy crawl github-user-search

About

Crawl information from github in friendly manner.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages