Skip to content

Releases: brianmadden/krawler

Upgrade Kotlin and Coroutines Package

29 Jan 18:18
f6edb2d
Compare
Choose a tag to compare
Pre-release
  • Upgrade Kotlin to 1.3.61
  • Upgrade kotlinx.coroutines. This required an update to some of the places where coroutine builders were called internally.
  • Upgrade Gradle wrapper

Queue Priority; Clear Queues; Always On Support

26 Nov 00:08
c7f527c
Compare
Choose a tag to compare
  • Added ability to clear crawl queues by RequestId and Age, see Krawler#removeUrlsByRootPage
    and Krawler#removeUrlsByAge
  • Added config option to prevent crawler shutdown on empty queues
  • Added new single byte priority field to KrawlQueueEntry. Queues will always attempt to pop the lowest priority
    entry available. Priority can be assigned by overriding the Krawler#assignQueuePriorty method.
  • Update dependencies

Remove Logger Implementation

16 Aug 05:04
Compare
Choose a tag to compare
Pre-release

0.4.1 (2017-8-15)

  • Removed logging implementation from dependencies to prevent logging conflicts when used as a library.
  • Updated Kotlin version to 1.1.4
  • Updated kotlinx.coroutines to .17

Coroutines!

16 May 14:35
Compare
Choose a tag to compare
Coroutines! Pre-release
Pre-release

0.4.0 (2017-5-17)

  • Rewrote core crawl loop to use Kotlin 1.1 coroutines. This has effectively turned the crawl process into a multi-stage pipeline. This architecture change has removed the necessity for some locking by removing resource contention by multiple threads.

  • Updated the build file to build the simple example as a runnable jar

  • Minor bug fies in the KrawlUrl class.

Kotlin 1.1, Logging, and Bug Fixes

03 Mar 21:42
Compare
Choose a tag to compare
Pre-release
  • Fixed a number of bugs that would result in a crashed thread, and subsequently an incorrect number of crawled pages
    as well as cause slowdowns due to a reduced number of worker threads.

  • Added a new utility function to wrap doCrawl and log any uncaught exceptions during crawling.

0.3.1 - Multi-queue and bug fixes

02 Feb 16:36
Compare
Choose a tag to compare
Pre-release
  • Created 1:1 mapping between threads and the number of queues used to serve URLs to visit. URLs have an
    affinity for a particular queue based on their domain. All URLs from that domain will end up in the same
    queue. This improves parallel crawl performance by reducing the frequency that the politeness delay
    effects requests. For crawls bound to fewer domains than queues, the excess queues are not used.
  • Many bug fixes including fix that eliminates accidental over-crawling.