Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pages with 403 errors not throwing errors #343

Open
mrispoli24 opened this issue Mar 18, 2019 · 1 comment
Open

Pages with 403 errors not throwing errors #343

mrispoli24 opened this issue Mar 18, 2019 · 1 comment
Labels

Comments

@mrispoli24
Copy link

What is the current behavior?

When you crawl a page that throws a 403 unauthorized error the crawler just hangs and stays there indefinitely. It ignores all timeouts and doesn't throw any erros.

If the current behavior is a bug, please provide the steps to reproduce

If you take the current crawler and try to run from a remote server on Digital Ocean for sites that implement blocking of bots the returned 403 error does not trigger the error promise. This can be replicated with any best buy URL as an example.

What is the expected behavior?

Sites that return 403 unauthorized errors should trigger the onError function and move on to the next URL to be crawled.

What is the motivation / use case for changing the behavior?

If a site implements this type of blocking it would halt your entire crawl process without triggering any kind of notification that this URL failed.

@iamprageeth
Copy link

To skip errors and to continue the script , you can use Node Js version < 15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants