ExecSync did not return according to the timeout set in the request #10094

rtheis · 2024-04-19T13:36:40Z

Description

While both kubernetes/kubernetes#123931 and #9568 have been closed, the problem still exists.

Steps to reproduce the issue

See kubernetes/kubernetes#123931 (comment) for the recreate steps.

Describe the results you received and expected

Exec timeout is honored.

What version of containerd are you using?

1.7.15

Any other relevant information

No response

Show configuration if it is related to CRI plugin.

No response

fuweid · 2024-04-19T15:34:25Z

Sorry for missing the comment in closed #9568. @tallclair @rtheis

I would like to explain more detail about shim IO handler first.

The following chart is about setns process's stdout dataflow.
When setns-process exits, we don't want to have any log loss issue so we will wait for IO closed.
The pipe+w writer should be closed after setns-process exited.
After pipe writer is closed, since there is only one writer, the pipe reader will return EOF in shim side.
And then shim closes the fifo+w and CRI plugin will get the EOF from fifo+r.

+--------------------------------+      +-----------shim----------+   +--------CRI--------+
|                                |      |                         |   |                   |
| setns-process/stdout (pipe+w)----------->(pipe+r)---->(fifo+w)-------->(fifo+r)-->FILE  |
|                                |      |                         |   |                   |
+--------------------------------+      +-------------------------+   +-------------------+

However, setns-process can have child processes or grandchild processes.
When ExecSync timeouts, CRI only kills setns-process.
Since we don't have sub-cgroup to track all the processes created by the setns-process, we can't force-stop these processes. So, these processes will hold pipe+w file descriptor. Based on current design in shim, shim will wait for EOF from pipe+r.

In 'sh', '-c', 'date >> /probe/readiness-log.txt && sleep 45' case, we kill the sh but sleep 45 is still holding the writer's file descriptor. That's why it takes so long.

Currently, I don't have better solution to cleanup all the processes created by setns-process without introducing sub-cgroup for each exec call. So, I introduces a drain_exec_sync_io_timeout in #7832.

By default, there is no timeout to drain IO. If user wants to fastfail after timeout, we can set it to 10ms.

Side note: The setns-process and init container should take responsibility to cleanup the orphaned processes #10002 (comment)

rtheis · 2024-04-25T14:32:51Z

@fuweid thank you. That works as you've noted. I'm concerned with changing the value since you all left the default value to effectively disable the timeout. I'd like a reasonable timeout but don't want data loss either. What is your recommendation?

fuweid · 2024-04-25T16:06:31Z

ping @dmcgowan @mikebrow

I think that the default value could be 3-5 seconds.

In #3361, we use 2 seconds to drain IO. 3-5 seconds looks safe

jokemanfire · 2024-05-27T08:31:17Z

I think it is the same problem.
test environment:
containerd 1.7.11
rustshim latest

test function:
containerd : TestContainerDrainExecIOAfterExit

It will exec exceed the time, and it will report a fail.In delete function. If the shim return no error , but the io still used by runc fork process , io.wait will wait long time.
May be add a time maximum time limit in io.wait ?
Fifo file write is not close will cause this problem.
rust-shim looks like use fifo directly , have no pipe in it.
friendly ping @fuweid , thanks

fuweid · 2024-05-27T09:14:43Z

Hi @jokemanfire ,

rust-shim looks like use fifo directly , have no pipe in it.

Yes, the rust-shim opens fifo directly. Without sub-cgroup for each exec process, I don't have good solution to handle this in CRI-side. Ping @abel-von @Burning1020 for help on rust-shim issue, since they are maintainers on rust-shim.

jokemanfire · 2024-05-27T13:31:33Z

Hi @fuweid @Burning1020 @abel-von
I try to resolve this problem in rust-shim at the beginning . But I think use fifo directly can reduce the copy-io 's performance loss now (I think it is a good way). How about add a timeout to io.wait to avoid waiting forever? not only for rust-shim, but for every occupation IO cases.
I think when containerd send a delete request to shim. It defaults that Shim can disable the writing end of IO. So it used io wait to wait for the remaining data to be written to completion. If it is necessary to keep waiting for the IO write end to close while deleting. Containerd believes too much in shim, I think this is not very reasonable.

Burning1020 · 2024-05-27T13:55:19Z

Hi @fuweid @Burning1020 @abel-von I try to resolve this problem in rust-shim at the beginning . But I think use fifo directly can reduce the copy-io 's performance loss now (I think it is a good way). How about add a timeout to io.wait to avoid waiting forever? not only for rust-shim, but for every occupation IO cases. I think when containerd send a delete request to shim. It defaults that Shim can disable the writing end of IO. So it used io wait to wait for the remaining data to be written to completion. If it is necessary to keep waiting for the IO write end to close while deleting. Containerd believes too much in shim, I think this is not very reasonable.

@jokemanfire I have tried to solve this problem in the two directions, but all failed:

Let shim "tell" container process to close its write side, but it is impossiable.
Use copy-io just for exec io. Even if the read side of pipe was cloased, the rust copy-io thread was still waiting for IO, as it is a synchronized syscall different with Goruntime.

So basically, I think there is no practical solution for rust-shimv2. As you mentioned to add a timeout to io.wait, maybe it could work.

fuweid · 2024-05-27T14:16:03Z

It defaults that Shim can disable the writing end of IO

When container holds fifo directly, the write side of fifo must be hold by the processes created by exec-init process.
It's unlikely to close it. The timeout is just for WaitIO progress. Without close event of fifo, the cri-containerd is still waiting for EOF from container. I believe that rust-shim still needs pipe and forwards bytes into fifo.

But I think use fifo directly can reduce the copy-io 's performance loss now (I think it is a good way).

Just my two cents. Correctness is first priority. And if there is performance data, it will be more convincible.

jokemanfire · 2024-05-28T01:52:21Z

How about try to use copy-io first , it may not same with go-shim to implementation copy-io @Burning1020 . I will try to do this thing, resolve issues that differ from goruntime implementation。

Use copy-io just for exec io. Even if the read side of pipe was cloased, the rust copy-io thread was still waiting for IO, as it is a synchronized syscall different with Goruntime.

And show the test performance data , if the performance is worth it . And I think change process.io.wait is more valuable. @fuweid
But I still think that process.io.wait must wait for the write-side have been closed .There are no restrictions on it here ,I think it's a bit unreasonable. Thanks.

fuweid · 2024-05-28T01:58:04Z

@jokemanfire

And show the test performance data , if the performance is worth it . And I think change process.io.wait is more valuable

Sorry I didn't follow you here. The write side of fifo is hold by processes in container instead of shim, while the CRI-containerd holds the read side of fifo. Even if you have timeout policy, the write side of fifo is still open. How does CRI-Containerd get the closed event? If you already have POC, please file pull request in rust-shim side, thanks

jokemanfire · 2024-05-28T02:37:46Z

Thanks for your patience , I know that the process is runc to start that , and hold the fifo write (such like sh -c sleep 365d), But I don't understand the containerd if it necessary to wait forever for the write end of IO to close after delete rpc request send ?That's my think point . The shim process pid file have been remove , but the io file wirte will still used by start process. containerd just close the read side will cause any problem? @fuweid

fuweid · 2024-05-30T02:30:15Z

Thanks for your patience , I know that the process is runc to start that , and hold the fifo write (such like sh -c sleep 365d), But I don't understand the containerd if it necessary to wait forever for the write end of IO to close after delete rpc request send

@jokemanfire Sorry for missing this comment. I have article to describe fifo things for that in Chinese https://fuweid.com/post/2022-embedshim-kernel-is-my-sidecar/ (google translation for English should work :joy ). Hope it can help.

Both containerd and containerd-shim don't wait for it forever.
The containerd is waiting for closed event for that fifo, which triggerd by containerd-shim.

When we call the delete call to shim, containerd-shim will wait for it in 2 seconds and close it. And then containerd will get the notification.

REF: https://github.com/containerd/containerd/pull/3691/files

jokemanfire · 2024-05-30T02:44:14Z

Thanks for your patience , I know that the process is runc to start that , and hold the fifo write (such like sh -c sleep 365d), But I don't understand the containerd if it necessary to wait forever for the write end of IO to close after delete rpc request send

@jokemanfire Sorry for missing this comment. I have article to describe fifo things for that in Chinese https://fuweid.com/post/2022-embedshim-kernel-is-my-sidecar/ (google translation for English should work :joy ). Hope it can help.

Both containerd and containerd-shim don't wait for it forever. The containerd is waiting for closed event for that fifo, which triggerd by containerd-shim.

Thank you very much , this is very useful to me.

REF: https://github.com/containerd/containerd/pull/3691/files

that code I have noticed , it shim close the pipe (If I didn't understand correctly) , but this is not applicable to ships that directly use FIFO 。 0.0

rtheis added the kind/bug label Apr 19, 2024

This was referenced Apr 19, 2024

ExecSync did not return according to the timeout set in the request #9568

Closed

readiness prober timeout do not run as expected kubernetes/kubernetes#123931

Closed

jokemanfire mentioned this issue May 29, 2024

Change fifo Io to PIPE in shim , just do like go shim containerd/rust-extensions#276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExecSync did not return according to the timeout set in the request #10094

ExecSync did not return according to the timeout set in the request #10094

rtheis commented Apr 19, 2024

fuweid commented Apr 19, 2024

rtheis commented Apr 25, 2024 •

edited

fuweid commented Apr 25, 2024

jokemanfire commented May 27, 2024 •

edited

fuweid commented May 27, 2024

jokemanfire commented May 27, 2024

Burning1020 commented May 27, 2024 •

edited

fuweid commented May 27, 2024

jokemanfire commented May 28, 2024 •

edited

fuweid commented May 28, 2024

jokemanfire commented May 28, 2024

fuweid commented May 30, 2024

jokemanfire commented May 30, 2024 •

edited

ExecSync did not return according to the timeout set in the request #10094

ExecSync did not return according to the timeout set in the request #10094

Comments

rtheis commented Apr 19, 2024

Description

Steps to reproduce the issue

Describe the results you received and expected

What version of containerd are you using?

Any other relevant information

Show configuration if it is related to CRI plugin.

fuweid commented Apr 19, 2024

rtheis commented Apr 25, 2024 • edited

fuweid commented Apr 25, 2024

jokemanfire commented May 27, 2024 • edited

fuweid commented May 27, 2024

jokemanfire commented May 27, 2024

Burning1020 commented May 27, 2024 • edited

fuweid commented May 27, 2024

jokemanfire commented May 28, 2024 • edited

fuweid commented May 28, 2024

jokemanfire commented May 28, 2024

fuweid commented May 30, 2024

jokemanfire commented May 30, 2024 • edited

rtheis commented Apr 25, 2024 •

edited

jokemanfire commented May 27, 2024 •

edited

Burning1020 commented May 27, 2024 •

edited

jokemanfire commented May 28, 2024 •

edited

jokemanfire commented May 30, 2024 •

edited