Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ScrapydWeb to Python/Scrapy; Remove trailing whitespace. #46

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ A collection of awesome web crawler,spider and resources in different languages.
- [Go](#go)
- [Scala](#scala)

## Python
## Python
* [Scrapy](https://github.com/scrapy/scrapy) - A fast high-level screen scraping and web crawling framework.
* [django-dynamic-scraper](https://github.com/holgerd77/django-dynamic-scraper) - Creating Scrapy scrapers via the Django admin interface.
* [Scrapy-Redis](https://github.com/rolando/scrapy-redis) - Redis-based components for Scrapy.
* [scrapy-cluster](https://github.com/istresearch/scrapy-cluster) - Uses Redis and Kafka to create a distributed on demand scraping cluster.
* [distribute_crawler](https://github.com/gnemoug/distribute_crawler) - Uses scrapy,redis, mongodb,graphite to create a distributed spider.
* [ScrapydWeb](https://github.com/my8100/scrapydweb) - A full-featured web UI for Scrapyd cluster management, which supports Scrapy Log Analysis & Visualization, Auto Packaging, Timer Tasks, Email Notice and so on.
* [pyspider](https://github.com/binux/pyspider) - A powerful spider system.
* [CoCrawler](https://github.com/cocrawler/cocrawler) - A versatile web crawler built using modern tools and concurrency.
* [cola](https://github.com/chineking/cola) - A distributed crawling framework.
Expand All @@ -35,14 +36,14 @@ A collection of awesome web crawler,spider and resources in different languages.
* [portia](https://github.com/scrapinghub/portia) - Visual scraping for Scrapy.
* [crawley](https://github.com/jmg/crawley) - Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
* [RoboBrowser](https://github.com/jmcarp/robobrowser) - A simple, Pythonic library for browsing the web without a standalone web browser.
* [MSpider](https://github.com/manning23/MSpider) - A simple ,easy spider using gevent and js render.
* [MSpider](https://github.com/manning23/MSpider) - A simple ,easy spider using gevent and js render.
* [brownant](https://github.com/douban/brownant) - A lightweight web data extracting framework.
* [PSpider](https://github.com/xianhu/PSpider) - A simple spider frame in Python3.
* [Gain](https://github.com/gaojiuli/gain) - Web crawling framework based on asyncio for everyone.
* [sukhoi](https://github.com/iogf/sukhoi) - Minimalist and powerful Web Crawler.
* [spidy](https://github.com/rivermont/spidy) - The simple, easy to use command line web crawler.
* [spidy](https://github.com/rivermont/spidy) - The simple, easy to use command line web crawler.
* [newspaper](https://github.com/codelucas/newspaper) - News, full-text, and article metadata extraction in Python 3
* [aspider](https://github.com/howie6879/aspider) - An async web scraping micro-framework based on asyncio.
* [aspider](https://github.com/howie6879/aspider) - An async web scraping micro-framework based on asyncio.

## Java
* [ACHE Crawler](https://github.com/ViDA-NYU/ache) - An easy to use web crawler for domain-specific search.
Expand All @@ -64,7 +65,7 @@ A collection of awesome web crawler,spider and resources in different languages.
* [webBee](https://github.com/pkwenda/webBee) - A DFS web spider.


## C#
## C#
* [ccrawler](http://www.findbestopensource.com/product/ccrawler) - Built in C# 3.5 version. it contains a simple extension of web content categorizer, which can saparate between the web page depending on their content.
* [SimpleCrawler](https://github.com/lei-zhu/SimpleCrawler) - Simple spider base on mutithreading, regluar expression.
* [DotnetSpider](https://github.com/zlzforever/DotnetSpider) - This is a cross platfrom, ligth spider develop by C#.
Expand All @@ -82,7 +83,7 @@ A collection of awesome web crawler,spider and resources in different languages.
* [x-ray](https://github.com/lapwinglabs/x-ray) - Web scraper with pagination and crawler support.
* [node-osmosis](https://github.com/rchipka/node-osmosis) - HTML/XML parser and web scraper for Node.js.
* [web-scraper-chrome-extension](https://github.com/martinsbalodis/web-scraper-chrome-extension) - Web data extraction tool implemented as chrome extension.
* [supercrawler](https://github.com/brendonboshell/supercrawler) - Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
* [supercrawler](https://github.com/brendonboshell/supercrawler) - Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
* [headless-chrome-crawler](https://github.com/yujiosaka/headless-chrome-crawler) - Headless Chrome crawls with jQuery support

## PHP
Expand Down Expand Up @@ -111,7 +112,7 @@ A collection of awesome web crawler,spider and resources in different languages.
## R
* [rvest](https://github.com/hadley/rvest) - Simple web scraping for R.

## Erlang
## Erlang
* [ebot](https://github.com/matteoredaelli/ebot) - A scalable, distribuited and highly configurable web cawler.

## Perl
Expand All @@ -121,7 +122,7 @@ A collection of awesome web crawler,spider and resources in different languages.
* [pholcus](https://github.com/henrylee2cn/pholcus) - A distributed, high concurrency and powerful web crawler.
* [gocrawl](https://github.com/PuerkitoBio/gocrawl) - Polite, slim and concurrent web crawler.
* [fetchbot](https://github.com/PuerkitoBio/fetchbot) - A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
* [go_spider](https://github.com/hu17889/go_spider) - An awesome Go concurrent Crawler(spider) framework.
* [go_spider](https://github.com/hu17889/go_spider) - An awesome Go concurrent Crawler(spider) framework.
* [dht](https://github.com/shiyanhui/dht) - BitTorrent DHT Protocol && DHT Spider.
* [ants-go](https://github.com/wcong/ants-go) - A open source, distributed, restful crawler engine in golang.
* [scrape](https://github.com/yhat/scrape) - A simple, higher level interface for Go web scraping.
Expand Down