Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore pages that have a 404 status code #82

Open
tbillington opened this issue May 10, 2020 · 6 comments
Open

Ignore pages that have a 404 status code #82

tbillington opened this issue May 10, 2020 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@tbillington
Copy link

Currently suckit will save pages even if they are indicated as not found by the webserver. I think this is erroneous behaviour.

Eg this page on my site that 404s was saved to disk.

Chrome dev tools:
Screen Shot 2020-05-10 at 8 01 14 pm

File explorer:
Screen Shot 2020-05-10 at 8 01 22 pm

@Skallwar Skallwar added enhancement New feature or request good first issue Good for newcomers labels May 10, 2020
@Skallwar
Copy link
Owner

We could have one 404 error page by website

@tbillington
Copy link
Author

As long as you're aware that is an opinionated choice :) some sites have custom 404s by section of the site etc, some will keep the original URL like in my screenshot, some will redirect to a dedicated 404 URL, some will show a 404 page with a 200 response.. Web crawling is messy!

Perhaps this could be a configuration thing, but that's up to you :)

@Skallwar Skallwar added help wanted Extra attention is needed and removed good first issue Good for newcomers labels May 28, 2020
@Skallwar
Copy link
Owner

Skallwar commented Jan 4, 2021

A good solution can be to hash a 404 or 200 webpage. This way if the page is specific to this URL it is saved, if not we could make a symbolic link to the generic one.

@tbillington
Copy link
Author

Yea I think it's tricky. If it's legitimately just a bad link to a page that was never existed or a href that was relative when it shouldn't have been you might hit an infinite loop (i've seen this in practise).

@Skallwar
Copy link
Owner

Skallwar commented Jan 5, 2021

Humm ok. We have more serious issues and very little time currently, we will give this a try latter

@tbillington
Copy link
Author

Yea no rush :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants