Releases · brianmadden/krawler · GitHub

29 Jan 18:18

brianmadden

Upgrade Kotlin and Coroutines Package Pre-release

Pre-release

Upgrade Kotlin to 1.3.61
Upgrade kotlinx.coroutines. This required an update to some of the places where coroutine builders were called internally.
Upgrade Gradle wrapper

Assets 2

26 Nov 00:08

brianmadden

Queue Priority; Clear Queues; Always On Support Pre-release

Pre-release

Added ability to clear crawl queues by RequestId and Age, see Krawler#removeUrlsByRootPage
and Krawler#removeUrlsByAge
Added config option to prevent crawler shutdown on empty queues
Added new single byte priority field to KrawlQueueEntry. Queues will always attempt to pop the lowest priority
entry available. Priority can be assigned by overriding the Krawler#assignQueuePriorty method.
Update dependencies

Assets 2

16 Aug 05:04

brianmadden

Remove Logger Implementation Pre-release

Pre-release

0.4.1 (2017-8-15)

Removed logging implementation from dependencies to prevent logging conflicts when used as a library.
Updated Kotlin version to 1.1.4
Updated kotlinx.coroutines to .17

Assets 2

16 May 14:35

brianmadden

Coroutines! Pre-release

Pre-release

0.4.0 (2017-5-17)

Rewrote core crawl loop to use Kotlin 1.1 coroutines. This has effectively turned the crawl process into a multi-stage pipeline. This architecture change has removed the necessity for some locking by removing resource contention by multiple threads.
Updated the build file to build the simple example as a runnable jar
Minor bug fies in the KrawlUrl class.

Assets 2

03 Mar 21:42

brianmadden

Kotlin 1.1, Logging, and Bug Fixes Pre-release

Pre-release

Fixed a number of bugs that would result in a crashed thread, and subsequently an incorrect number of crawled pages
as well as cause slowdowns due to a reduced number of worker threads.
Added a new utility function to wrap doCrawl and log any uncaught exceptions during crawling.

Assets 2

02 Feb 16:36

brianmadden

0.3.1 - Multi-queue and bug fixes Pre-release

Pre-release

Created 1:1 mapping between threads and the number of queues used to serve URLs to visit. URLs have an
affinity for a particular queue based on their domain. All URLs from that domain will end up in the same
queue. This improves parallel crawl performance by reducing the frequency that the politeness delay
effects requests. For crawls bound to fewer domains than queues, the excess queues are not used.
Many bug fixes including fix that eliminates accidental over-crawling.

Assets 2