Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: EnqueueStrategy.All erroring with links using unsupported protocols #2389

Merged
merged 2 commits into from May 15, 2024

Conversation

stefansundin
Copy link
Contributor

@stefansundin stefansundin commented Mar 22, 2024

This changes EnqueueStrategy.All to filter out non-http and non-https URLs (mailto: links were causing the crawler to error).

Let me know if there's a better fix or if you want me to change something.

Thanks!

Request failed and reached maximum retries. Error: Received one or more errors
    at _ArrayValidator.handle (/path/to/project/node_modules/@sapphire/shapeshift/src/validators/ArrayValidator.ts:102:17)
    at _ArrayValidator.parse (/path/to/project/node_modules/@sapphire/shapeshift/src/validators/BaseValidator.ts:103:2)
    at RequestQueueClient.batchAddRequests (/path/to/project/node_modules/@crawlee/src/resource-clients/request-queue.ts:340:36)
    at RequestQueue.addRequests (/path/to/project/node_modules/@crawlee/src/storages/request_provider.ts:238:46)
    at RequestQueue.addRequests (/path/to/project/node_modules/@crawlee/src/storages/request_queue.ts:304:22)
    at attemptToAddToQueueAndAddAnyUnprocessed (/path/to/project/node_modules/@crawlee/src/storages/request_provider.ts:302:42)
    at RequestQueue.addRequestsBatched (/path/to/project/node_modules/@crawlee/src/storages/request_provider.ts:319:37)
    at RequestQueue.addRequestsBatched (/path/to/project/node_modules/@crawlee/src/storages/request_queue.ts:309:22)
    at enqueueLinks (/path/to/project/node_modules/@crawlee/src/enqueue_links/enqueue_links.ts:384:2)
    at browserCrawlerEnqueueLinks (/path/to/project/node_modules/@crawlee/src/internals/browser-crawler.ts:777:21)

…s (`mailto:` links were causing the crawler to error).
@stefansundin stefansundin changed the title Fix EnqueueStrategy.All erroring with mailto: links fix: EnqueueStrategy.All erroring with links using unsupported protocols Mar 22, 2024
@B4nan B4nan requested a review from vladfrangu March 22, 2024 08:09
Copy link
Member

@vladfrangu vladfrangu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, can you also add a test case for this please? 🙏

@B4nan
Copy link
Member

B4nan commented Mar 27, 2024

@stefansundin do you plan to finish this? I'd rather not merge such change without any added tests

@stefansundin
Copy link
Contributor Author

Hi @B4nan. I started writing a test but I had some more important work come up that took priority.

I may be able to finish it next week.

If you prefer then we can close this PR and open an issue instead.

@B4nan B4nan merged commit 8db3908 into apify:master May 15, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants