Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos: platfro to platform and ligt to light #97

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Empty file modified .gitignore
100644 → 100755
Empty file.
Empty file modified CONTRIBUTING.md
100644 → 100755
Empty file.
Empty file modified LICENSE
100644 → 100755
Empty file.
20 changes: 10 additions & 10 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ A collection of awesome web crawler,spider and resources in different languages.
- [Go](#go)
- [Scala](#scala)

## Python
## Python
* [Scrapy](https://github.com/scrapy/scrapy) - A fast high-level screen scraping and web crawling framework.
* [django-dynamic-scraper](https://github.com/holgerd77/django-dynamic-scraper) - Creating Scrapy scrapers via the Django admin interface.
* [Scrapy-Redis](https://github.com/rolando/scrapy-redis) - Redis-based components for Scrapy.
Expand All @@ -35,14 +35,14 @@ A collection of awesome web crawler,spider and resources in different languages.
* [portia](https://github.com/scrapinghub/portia) - Visual scraping for Scrapy.
* [crawley](https://github.com/jmg/crawley) - Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
* [RoboBrowser](https://github.com/jmcarp/robobrowser) - A simple, Pythonic library for browsing the web without a standalone web browser.
* [MSpider](https://github.com/manning23/MSpider) - A simple ,easy spider using gevent and js render.
* [MSpider](https://github.com/manning23/MSpider) - A simple ,easy spider using gevent and js render.
* [brownant](https://github.com/douban/brownant) - A lightweight web data extracting framework.
* [PSpider](https://github.com/xianhu/PSpider) - A simple spider frame in Python3.
* [Gain](https://github.com/gaojiuli/gain) - Web crawling framework based on asyncio for everyone.
* [sukhoi](https://github.com/iogf/sukhoi) - Minimalist and powerful Web Crawler.
* [spidy](https://github.com/rivermont/spidy) - The simple, easy to use command line web crawler.
* [spidy](https://github.com/rivermont/spidy) - The simple, easy to use command line web crawler.
* [newspaper](https://github.com/codelucas/newspaper) - News, full-text, and article metadata extraction in Python 3
* [aspider](https://github.com/howie6879/aspider) - An async web scraping micro-framework based on asyncio.
* [aspider](https://github.com/howie6879/aspider) - An async web scraping micro-framework based on asyncio.

## Java
* [ACHE Crawler](https://github.com/ViDA-NYU/ache) - An easy to use web crawler for domain-specific search.
Expand All @@ -66,10 +66,10 @@ A collection of awesome web crawler,spider and resources in different languages.
* [Norconex Web Crawler](https://github.com/Norconex/collector-http) - Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repository of your choice (e.g. a search engine). Can be used as a stand alone application or be embedded into Java applications.


## C#
## C#
* [ccrawler](http://www.findbestopensource.com/product/ccrawler) - Built in C# 3.5 version. it contains a simple extension of web content categorizer, which can separate between the web page depending on their content.
* [SimpleCrawler](https://github.com/lei-zhu/SimpleCrawler) - Simple spider base on mutithreading, regluar expression.
* [DotnetSpider](https://github.com/zlzforever/DotnetSpider) - This is a cross platfrom, ligth spider develop by C#.
* [DotnetSpider](https://github.com/zlzforever/DotnetSpider) - This is a cross platform, light spider develop by C#.
* [Abot](https://github.com/sjdirect/abot) - C# web crawler built for speed and flexibility.
* [Hawk](https://github.com/ferventdesert/Hawk) - Advanced Crawler and ETL tool written in C#/WPF.
* [SkyScraper](https://github.com/JonCanning/SkyScraper) - An asynchronous web scraper / web crawler using async / await and Reactive Extensions.
Expand All @@ -85,10 +85,10 @@ A collection of awesome web crawler,spider and resources in different languages.
* [x-ray](https://github.com/lapwinglabs/x-ray) - Web scraper with pagination and crawler support.
* [node-osmosis](https://github.com/rchipka/node-osmosis) - HTML/XML parser and web scraper for Node.js.
* [web-scraper-chrome-extension](https://github.com/martinsbalodis/web-scraper-chrome-extension) - Web data extraction tool implemented as chrome extension.
* [supercrawler](https://github.com/brendonboshell/supercrawler) - Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
* [supercrawler](https://github.com/brendonboshell/supercrawler) - Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
* [headless-chrome-crawler](https://github.com/yujiosaka/headless-chrome-crawler) - Headless Chrome crawls with jQuery support
* [Squidwarc](https://github.com/n0tan3rd/squidwarc) - High fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
* [crawlee](https://github.com/apify/crawlee) - A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast.
* [crawlee](https://github.com/apify/crawlee) - A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast.


## PHP
Expand Down Expand Up @@ -124,7 +124,7 @@ A collection of awesome web crawler,spider and resources in different languages.
## R
* [rvest](https://github.com/hadley/rvest) - Simple web scraping for R.

## Erlang
## Erlang
* [ebot](https://github.com/matteoredaelli/ebot) - A scalable, distribuited and highly configurable web cawler.

## Perl
Expand All @@ -134,7 +134,7 @@ A collection of awesome web crawler,spider and resources in different languages.
* [pholcus](https://github.com/henrylee2cn/pholcus) - A distributed, high concurrency and powerful web crawler.
* [gocrawl](https://github.com/PuerkitoBio/gocrawl) - Polite, slim and concurrent web crawler.
* [fetchbot](https://github.com/PuerkitoBio/fetchbot) - A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
* [go_spider](https://github.com/hu17889/go_spider) - An awesome Go concurrent Crawler(spider) framework.
* [go_spider](https://github.com/hu17889/go_spider) - An awesome Go concurrent Crawler(spider) framework.
* [dht](https://github.com/shiyanhui/dht) - BitTorrent DHT Protocol && DHT Spider.
* [ants-go](https://github.com/wcong/ants-go) - A open source, distributed, restful crawler engine in golang.
* [scrape](https://github.com/yhat/scrape) - A simple, higher level interface for Go web scraping.
Expand Down