Skip to content

Releases: peterbencze/serritor

Serritor 2.1.1

11 Jun 12:42
1589393
Compare
Choose a tag to compare
  • Fix bug where crawl seeds were fed to the frontier twice, resulting in incorrect crawl stats
  • Fix bug where crawl stats were not reset when the crawler was restarted after its state was restored
  • Update dependency versions

Serritor 2.1.0

18 Jun 22:18
12b07fc
Compare
Choose a tag to compare

This release includes new features, improvements and changes to the existing API.

Changes in a nutshell:

  • Add helper class for finding text in response content
  • Refactor UrlFinder
  • Modify HTTP client so that it uses the same user-defined HTTP proxy as Selenium
  • Ignore authentication cookie when cookie authentication is not enabled
  • Use MutableCapabilities instead of DesiredCapabilities when configuring the browser

Serritor 2.0.0

30 May 20:29
eaac224
Compare
Choose a tag to compare

This major release includes a number of new features, bug fixes and changes to the existing API.
Changes in a nutshell:

  • Add internal proxy server to overcome Selenium limitations (no access to response headers etc.)
  • Add onBrowserInit callback to configure the browser before the crawling begins
  • Always call onStop even if an unhandled exception is thrown
  • Rename callbacks
  • Add detailed logging
  • Use slf4j instead of builtin logger
  • Add web API feature
    ... and more

Serritor 1.6.0

04 Nov 19:48
10ed6f8
Compare
Choose a tag to compare

This release adds the possibility to specify custom callbacks for crawl events.

Serritor 1.5.0

02 Sep 21:52
63d5c3f
Compare
Choose a tag to compare

This release includes bug fixes and a number of enhancements and new features.
Major changes in a nutshell:

  • Change the access modifier of the stop method
  • Add the possibility to download files
  • Add the possibility to retrieve response content type
  • Fix browser compatibility check exception when using HtmlUnitDriver
  • Add default URL finder creation method
  • Remove Selenium cookie synchronization
  • Add support for loading config from previously saved state
  • Add static methods for creating crawl requests with the default config

Serritor 1.4.0

23 Jun 14:13
3061e63
Compare
Choose a tag to compare

This release includes a number of bug fixes and improvements.

Serritor 1.3.1

22 Apr 15:51
a91dcf5
Compare
Choose a tag to compare

This release includes a new feature and changes to the existing API.
Changes in a nutshell:

  • Changes how the crawler is configured:
    • Adds CrawlerConfigurationBuilder for building CrawlerConfiguration instances
    • The configuration is passed to the crawler's constructor
  • Adds the possibility to download the file in onNonHtmlResponse callback

Please check the Wiki for more information.

Serritor 1.3.0

16 Mar 23:35
b28ac7d
Compare
Choose a tag to compare

This release includes new features, improvements and changes to the existing API.

New features in a nutshell:

  • Crawl domains: they specify the domains in which crawling is allowed
  • Crawl delay mechanisms: these can be used to determine the delay between each request
  • Url finder: it can be used to find URLs in HTML sources using regular expressions

Please check the Wiki for more information.

Serritor 1.2.1

10 Feb 20:26
ebdafa5
Compare
Choose a tag to compare

This release includes minor fixes and improvements (including changes to the API, please check the Wiki for more information).

Serritor 1.2

18 Jul 21:29
Compare
Choose a tag to compare

This release includes new features, bug fixes and major API modifications. Please check the documentation for more information.