Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue Depth functionality + Panic with Async #774

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

KristinnVikar
Copy link

Description

This pull request adds support for Depth in Queue and adds a panic when attempting to use Async with Queue, as they are incompatible. The changes ensure that MaxDepth works as expected when using Queues.

I had to figure out by trial and error that Async caused my Queue based crawler to instantly finish since it thought the returned promises were 'completed' requests.

Changes Made

  1. Added a default value for Depth for Requests made in AddURL.
  2. Implemented a check in the AddRequest method to increase the depth for each nested request.
  3. Added a panic in the Queue.Run method to prevent users from using Async with the Queue, as it's not supported.

Checklist

  • Added unit tests for change

Please let me know if you require any changes to this Pull Request!
And if there's any interest I did make a UniqueInMemoryQueueStorage struct in my project that only stores non duplicate entries within the Queue.

Thank you!

@@ -158,7 +158,7 @@ func (r *Request) Retry() error {

// Do submits the request
func (r *Request) Do() error {
return r.collector.scrape(r.URL.String(), r.Method, r.Depth, r.Body, r.Ctx, *r.Headers, !r.collector.AllowURLRevisit)
return r.collector.scrape(r.URL.String(), r.Method, r.Depth+1, r.Body, r.Ctx, *r.Headers, !r.collector.AllowURLRevisit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change. Could you please clarify? The test you added doesn't catch it if I revert this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd, the TestCollectorDepth test should catch that, it's so that the depth increases on each request, so that you can use MaxDepth in Queue-based crawlers. I'll take a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants