Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking Diff Snapshots is not transactional #4545

Closed
JonathanWoollett-Light opened this issue Apr 8, 2024 · 2 comments
Closed

Taking Diff Snapshots is not transactional #4545

JonathanWoollett-Light opened this issue Apr 8, 2024 · 2 comments
Assignees
Labels
Good first issue Indicates a good issue for first-time contributors

Comments

@JonathanWoollett-Light
Copy link
Contributor

Description:
To determine which parts of guest memory should be included in a differential snapshot, we have two "dirty bitmaps" to consult: The one maintained by KVM for tracking memory writes done by the guest, and one in firecracker for guest memory accesses done by firecracker. After taking a differential snapshot, we reset both of these bitmaps, such that the next snapshot will be a diff to the one we just took. This is done in two places

  1. Inside the KVM_GET_DIRTY_LOG ioctl, whose documentation states that it will only return the pages dirties since the last call to the ioctl, and
  2. Inside Firecracker's dump_dirty function, which resets our own bitmap.

However, we call KVM_GET_DIRTY_LOG before we start to write the snapshot file, and we clear Firecracker's bitmap in the middle of writing the snapshot (we write snapshots region-by-region, and clear the bitmap after dumping each individual region).

Should something go wrong while writing the snapshot file, we abort writing it and return an error. From this point onward, the dirty bitmaps are in an inconsistent state, and if one then retries taking a diff snapshot, it will be unusable (as both KVM's and firecracker's log will think the last diff snapshot taken was successful, and thus not include any pages that would have been included in that snapshot).

Acceptance Criteria:
The fix is in two parts

  1. Only clear Firecracker's dirty bitmaps after the snapshot has been successfully written
  2. If an error occurs during snapshot dumping, look at which pages KVM reported as dirty, and mark those as dirty in Firecracker's bitmap (as there is no KVM_SET_DIRTY_LOG ioctl, we have to keep track of this information outside of KVM now).

Then, we need a test that verifies the fix.

@JonathanWoollett-Light JonathanWoollett-Light added the Good first issue Indicates a good issue for first-time contributors label Apr 8, 2024
@JackThomson2
Copy link
Contributor

I'm working on this, branch can be seen here: https://github.com/JackThomson2/firecracker/tree/feat/diff_snapshot_transaction

@roypat
Copy link
Contributor

roypat commented May 16, 2024

Fixed by #4580

@roypat roypat closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good first issue Indicates a good issue for first-time contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants