Skip to content
This repository has been archived by the owner on Jun 10, 2024. It is now read-only.

Releases: binux/pyspider

First PyPI Release

11 Jan 05:38
Compare
Choose a tag to compare
  • A lot of bug fixed.
  • Make pyspider as a single top-level package. (thanks to zbb, iamtew and fmueller from HN)
  • Python 3 support!
  • Use click to create a better command line interface.
  • Postgresql Supported via SQLAlchemy (with the power of SQLAlchemy, pyspider also support Oracle, SQL Server, etc).
  • Benchmark test.
  • Documentation & tutorial: http://docs.pyspider.org/
  • Flake8 cleanup (thanks to @jtwaleson)

Base

  • Use messagepack instead of pickle in message queue.
  • JSON data will encoding as base64 string when content is binary.
  • Rabbitmq lazy limit for better performance.

Scheduler

  • Never re-crawl a task with a negative age.

Fetcher

  • proxy parameter support ip:port format.
  • increase default fetcher poolsize to 100.
  • PhantomJS will return JS script result in Response.js_script_result.

Processor

  • Put multiple new tasks in one package. performance for rabbitmq.
  • Not store all of the headers when success.

Script

  • Add an interface to generate taskid with task object. get_taskid
  • Task would be de-duplicated by project and taskid.

Webui

  • Project list sortable.
  • Return 404 page when dump a not exists project.
  • Web preview support image

First Working Release

12 Nov 13:24
Compare
Choose a tag to compare
First Working Release Pre-release
Pre-release

Base

  • mysql, mongodb backend support, and you can use a database uri to setup them.
  • rabbitmq as Queue for distributed deployment
  • docker supported
  • support for Windows
  • support for python2.6
  • a resultdb, result_worker and WEBUI is added.

Scheduler

  • cronjob task supported
  • delete project supported

Fetcher

  • a phantomjs fetcher is added. now you can fetch pages with javascript/ajax technology!

Processor

  • send_message api to send message to other projects
  • now you can import other project as module via from projects import xxxx
  • @config helper for setting configs for a callback

WEBUI

  • a css selector helper is added to debugger.
  • a option to switch JS/CSS CDN.
  • a page of task history/config
  • a page of recent active tasks
  • pages of results
  • a demo mode is added for http://demo.pyspider.org/

Others

  • bug fixes
  • more tests, coverage is used.

First Runnable Release

09 Mar 03:08
Compare
Choose a tag to compare
Pre-release

finish a basic runnable system with:

  • sqlite3 task & project database
  • runnable scheduler & fetcher & processor
  • basic dashboard and debugger