New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove tmp image if the download operation fails #1140
Conversation
What do you think is the safety profile of always deleting the file before starting? Is there a chance of premature promotion if there's bad concurrency? Do we stop that in the control plane? Do we stop it via how systemd units are named for background jobs? can we stop it with flock? Can we stop it one or more of those approaches for bug detection and defense in depth? The reason to not use a rescue like this is it relies on the program to run to get the result you want. |
I discussed this topic with Burak and will continue to work on it. Meanwhile, my other image download PRs are not specifically dependent on this one. |
85c88bf
to
d4150b6
Compare
9c23c56
to
230ddfb
Compare
9faf343
to
3d55c71
Compare
91d1e80
to
826a82a
Compare
It's just a structural refactor. No need extra indentation.
At present, we download the image to a temporary path and then move it to the final location using the `qemu-img convert` command. However, this process is not atomic and can be time-consuming. It would be more efficient to use the `File.rename` command to move the image directly to the final location, as this is an atomic operation. This method also allows other virtual machines to use the old image right up until the last moment.
If the boot image already exists, the virtual machine uses it. However, in certain situations like updating GitHub runner images, we need to download the latest version. If the force flag is set to true, the VM host will download the image regardless of its existence.
Right now, we use the "File::CREAT | File::EXCL" flags to open a temporary file for image downloads, which should stop multiple downloads happening at once. These flags only let one process create the file, and if the file is already there, it fails. However, if a download gets interrupted, the file sticks around and the next download attempt fails. It's tricky to figure out if the file is incomplete, so we have to delete it manually. Using flock to lock a lock file is a more effective way to stop multiple downloads idempotently at the same time.
The previous version of the `download_boot_image` method fails when the file already exists. I've incorporated a `force` flag into the method to overwrite any pre-existing file. This way, the program doesn't need to delete the existing file before downloading the boot image.
826a82a
to
b48eb69
Compare
@fdr I refactored this PR a lot. It uses flock. |
ack, digging in |
@@ -470,43 +470,43 @@ def download_boot_image(boot_image, custom_url: nil, ca_path: nil) | |||
|
|||
download = urls.fetch(boot_image) || custom_url | |||
image_path = vp.image_path(boot_image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for the commit message and breaking this out; reviewing it with ?w=1 in github was great.
Exit early if the image already exists
It's just a structural refactor. No need extra indentation.
Finalize the image download with File.rename
At present, we download the image to a temporary path and then move it to the final location using the
qemu-img convert
command. However, this process is not atomic and can be time-consuming. It would be more efficient to use theFile.rename
command to move the image directly to the final location, as this is an atomic operation. This method also allows other virtual machines to use the old image right up until the last moment.Add force flag to download image even if exists
If the boot image already exists, the virtual machine uses it. However, in certain situations like updating GitHub runner images, we need to download the latest version. If the force flag is set to true, the VM host will download the image regardless of its existence.
Allow only one concurrent image download with flock
Right now, we use the "File::CREAT | File::EXCL" flags to open a temporary file for image downloads, which should stop multiple downloads happening at once. These flags only let one process create the file, and if the file is already there, it fails. However, if a download gets interrupted, the file sticks around and the next download attempt fails. It's tricky to figure out if the file is incomplete, so we have to delete it manually. Using flock to lock a lock file is a more effective way to stop multiple downloads idempotently at the same time.
Use force flag in the boot image download prog
The previous version of the
download_boot_image
method fails when the file already exists. I've incorporated aforce
flag into the method to overwrite any pre-existing file. This way, the program doesn't need to delete the existing file before downloading the boot image.Fixes #911