Add option to abort VMM when parent process dies #4172

jseba · 2023-10-12T16:45:52Z

Changes

This adds a flag to set the parent death signal (SIGUSR2 in this instance) that the process will receive when its parent process exits before the VMM does. Receipt of this signal will cause the VMM to abruptly abort, much like the SIGILL signal. While a graceful shutdown would be preferable, since the parent process may have been controlling outside resources for Firecracker (disks, networking, etc.), it's indeterminate whether or not it is safe to continue running the VM.

Reason

If Firecracker is being monitored by a parent process that unexpectedly terminates, it will be abandoned up the process tree, likely to a process that doesn't know what do with it (such as init). This becomes even trickier if the process was running in a mount namespace that was controlled by the parent process, as the API socket is now inaccessible.

If the parent process was also keeping handles on other resources used by the Firecracker VMM, these could be re-used by new processes and cause conflicts with the now orphaned Firecracker.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

If a specific issue led to this PR, this PR closes the issue.
The description of changes is clear and encompassing.
Any required documentation changes (code and docs) are included in this PR.
API changes follow the Runbook for Firecracker API changes.
User-facing changes are mentioned in CHANGELOG.md.
All added/changed functionality is tested.
New TODOs link to an issue.
Commits meet contribution quality standards.

This functionality cannot be added in rust-vmm.

bchalios · 2023-10-16T12:04:39Z

Hi @jseba,

Thanks for the contribution. I have some questions to understand a bit more the context:

You send SIGUSR2 to let Firecracker know that its parent process died. Please, correct me if I'm wrong, but this is not a standard mechanism, i.e. processes do not receive SIGUSR2 when their parent process dies. So, why not just send SIGINT to kill the Firecracker process?

jseba · 2023-10-17T17:41:48Z

I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler

bchalios · 2023-10-18T07:51:45Z

Thanks for the explanation @jseba.

I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death

Just to clarify, the signal does not arrive from the OS, IOW it is not the kernel that sends the signal, it is some monitoring process (I guess), right?

in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler

I don't think that reserving a signal number is worth it for the sole purpose of logging the reason why Firecracker was shutdown. Is there any other cleanup/actions you have in mind?

jseba · 2023-10-25T16:23:23Z

Thanks for the explanation @jseba.

I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death

Just to clarify, the signal does not arrive from the OS, IOW it is not the kernel that sends the signal, it is some monitoring process (I guess), right?

No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.

in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler

I don't think that reserving a signal number is worth it for the sole purpose of logging the reason why Firecracker was shutdown. Is there any other cleanup/actions you have in mind?

The log message was the only thing I could think of off hand behavior wise, but my thought process was that if the parent process is still alive and firecracker gets SIGTERM, even from another process, the parent process will still be running to get the result from waitpid, so firecracker can gracefully shut itself down. If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.

If Firecracker is being monitored by a parent process that unexpectedly terminates, it will be abandoned up the process tree, likely to a process that doesn't know what do with it (such as init). This becomes even trickier if the process was running in a mount namespace that was controlled by the parent process, as the API socket is now inaccessible. If the parent process was also keeping handles on other resources used by the Firecracker VMM, these could be re-used by new processes and cause conflicts with the now orphaned Firecracker. This adds a flag to set the parent death signal (SIGUSR2 in this instance) that the process will receive when its parent process exits before the VMM does. Receipt of this signal will cause the VMM to abruptly abort, much like the SIGILL signal. While a graceful shutdown would be preferable, since the parent process may have been controlling outside resources for Firecracker (disks, networking, etc.), it's indeterminate whether or not it is safe to continue running the VM. Signed-off-by: Josh Seba <jseba@cloudflare.com>

jseba · 2023-11-08T22:51:30Z

I fixed the merge conflict and rebased back onto the tip of main

bchalios · 2023-11-09T09:41:34Z

No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.

So, you are saying that when a process dies, Linux sends by itself SIGUSR2 to all its children processes? I was not aware of this mechanism. Could you share a link that documents that?

If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.

Like what? What is the worst thing that can happen? And why can this happen without Firecracker's involvement?

jseba · 2023-12-12T23:21:22Z

No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.

So, you are saying that when a process dies, Linux sends by itself SIGUSR2 to all its children processes? I was not aware of this mechanism. Could you share a link that documents that?

Linux by default sends nothing. The PR_SET_PDEATHSIG option tells it to send the designated signal to the calling process when the calling process's parent dies.

If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.

Like what? What is the worst thing that can happen? And why can this happen without Firecracker's involvement?

The most obvious causes would be either a crash or an OOM kill from the kernel. Either way, the process terminates immediately with no chance for graceful cleanup. In our case, one of things being managed is the network namespace, device and address used for the Firecracker VM. If the parent process dies, the networking configuration used for the VM will be released to be used by another VM while the now-abandoned one continues running. This means another VM can be started on the host with the network configuration that the VM was using, meaning the abandoned VM will no longer have network connectivity. It is not safe to assume that simply terminating the VM gracefully will work in a timely manner, as services running in the VM could attempt to flush data and hang waiting for responses.

bchalios · 2023-12-18T11:52:34Z

Linux by default sends nothing. The PR_SET_PDEATHSIG option tells it to send the designated signal to the calling process when the calling process's parent dies.

Thanks, I didn't know about this mechanism :)

Either way, the process terminates immediately with no chance for graceful cleanup

The graceful clean-up in this case just logging about the fact we are shutting down due to the parent process dying, other than that it's completely equivalent to just sending a SIGTERM to Firecracker.

It is not safe to assume that simply terminating the VM gracefully will work in a timely manner, as services running in the VM could attempt to flush data and hang waiting for responses.

That is true, but my point is that I don't think we need to change Firecracker itself to achieve this. In fact, I don't think we should change Firecracker at all since there's nothing that the Firecracker process needs to do other than shutting down.

What if, for example, you called prctl after you fork from the current process but before exec'ing in the Firecracker?

jseba force-pushed the parent_death_sig branch from c1164ce to bf00b57 Compare November 8, 2023 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to abort VMM when parent process dies #4172

Add option to abort VMM when parent process dies #4172

jseba commented Oct 12, 2023

bchalios commented Oct 16, 2023

jseba commented Oct 17, 2023

bchalios commented Oct 18, 2023

jseba commented Oct 25, 2023

jseba commented Nov 8, 2023

bchalios commented Nov 9, 2023

jseba commented Dec 12, 2023

bchalios commented Dec 18, 2023

Add option to abort VMM when parent process dies #4172

Are you sure you want to change the base?

Add option to abort VMM when parent process dies #4172

Conversation

jseba commented Oct 12, 2023

Changes

Reason

License Acceptance

PR Checklist

bchalios commented Oct 16, 2023

jseba commented Oct 17, 2023

bchalios commented Oct 18, 2023

jseba commented Oct 25, 2023

jseba commented Nov 8, 2023

bchalios commented Nov 9, 2023

jseba commented Dec 12, 2023

bchalios commented Dec 18, 2023