Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for continuation behaviour on broken dataset archives due to starving download connections via HTTP-GET #6380

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RuntimeRacer
Copy link

This PR proposes a (slightly hacky) fix for an Issue that can occur when downloading large dataset parts over unstable connections.
The underlying issue is also being discussed in #5594.

Issue Symptoms & Behaviour:

  • Download of a large archive file during dataset download via HTTP-GET fails.
  • An silent net exception (which I was unable to identify) is thrown within the tqdm download progress.
  • Due to missing exception catch code, the above process just continues processing, assuming http_get completed successfully.
  • Pending Archive file gets renamed to remove the .incomplete extension, despite not all data has been downloaded.
  • Also, for reasons I did not investigate, there seems to be no real integrity check for the downloaded files; or it does not detect this problem. This is especially problematic, since the downloader script won't retry downloading this archive after CRC-Checking, even if it is being manually restarted / executed again after running into errors on extraction.

Fix proposal: Adding a retry mechanic for HTTP-GET downloads, which adds the following behaviour:

  • Download Progress Thread checks for download size validity in case the HTTP connection starves mid download. If the check fails, a RuntimeError is thrown
  • Cache Downloader code with retry mechanic monitors for an exception thrown by the download progress thread, and retries download with updated resume_size.
  • Cache Downloader will not mark incomplete files which have thrown an exception during download, and exceeded retries, as complete.

…ing behaviour:

- Download Progress Thread checks for download size validity in case the HTTP connection starves mid download. If the check fails, a RuntimeError is thrown
- Cache Downloader code with retry mechanic monitors for an exception thrown by the download progress thread, and retries download with updated `resume_size`.
- Cache Downloader will not mark incomplete files which have thrown an exception during download, and exceeded retries, as complete.
@RuntimeRacer RuntimeRacer changed the title Fix for continuation behaviour on broken dataset archives on starving download connections via HTTP-GET Fix for continuation behaviour on broken dataset archives due to starving download connections via HTTP-GET Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant