Pages with 403 errors not throwing errors #343

mrispoli24 · 2019-03-18T18:24:08Z

What is the current behavior?

When you crawl a page that throws a 403 unauthorized error the crawler just hangs and stays there indefinitely. It ignores all timeouts and doesn't throw any erros.

If the current behavior is a bug, please provide the steps to reproduce

If you take the current crawler and try to run from a remote server on Digital Ocean for sites that implement blocking of bots the returned 403 error does not trigger the error promise. This can be replicated with any best buy URL as an example.

What is the expected behavior?

Sites that return 403 unauthorized errors should trigger the onError function and move on to the next URL to be crawled.

What is the motivation / use case for changing the behavior?

If a site implements this type of blocking it would halt your entire crawl process without triggering any kind of notification that this URL failed.

iamprageeth · 2022-06-22T10:54:35Z

To skip errors and to continue the script , you can use Node Js version < 15

kulikalov added the feature label Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pages with 403 errors not throwing errors #343

Pages with 403 errors not throwing errors #343

mrispoli24 commented Mar 18, 2019

iamprageeth commented Jun 22, 2022

Pages with 403 errors not throwing errors #343

Pages with 403 errors not throwing errors #343

Comments

mrispoli24 commented Mar 18, 2019

iamprageeth commented Jun 22, 2022