You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
//test/benchmarks/tcp:tcp_benchmark can run with netstack as the iperf client, server, or neither (native). As the server (and with host GRO/GSO enabled) I see throughput similar to native. As the client, I regularly see the same pattern: a few seconds of throughput at parity with Linux followed by a complete cratering of throughput:
Looking at the logs, this looks to be triggered by netstack's inability to handle shrinking receive windows. In the pcap shrinkingWindowMini.pcap.zip (shrunk down to only the relevant packets to keep the file size manageable), you can see two things. First, there are several instances of "normal" full receive buffers / zero windows. These are the reason for the graph's flat shape: transfer is limited by rwnd, not cwnd.
Second, at the end of the capture is the sequence of packets that corresponds to the massive throughput drop in the graph. There are two notable bits here:
There's an RTO-sized gap between the zero window ACK and the next packet (which is our zero window probe)
The receive window shrinks. This can't be seen in the sliced-up pcap (because it lacks the handshake with the window size), but with the full log:
Note that the [TCP Window Full] packet has sequence number 319710246 and length 1920, indicating that this fills the receive window. But the [TCP ZeroWindow] packet has the same sequence number, meaning that the 1920 sent bytes are out of window. Thus netstack considers this an RTO and drops the cwnd all the way to 1 segment, causing the slowdown.
But per RFC 9293 3.8.6, netstack shouldn't consider those bytes relevant to an RTO:
A TCP receiver SHOULD NOT shrink the window, i.e., move the right
window edge to the left (SHLD-14). However, a sending TCP peer MUST
be robust against window shrinking, which may cause the "usable
window" (see Section 3.8.6.2.1) to become negative (MUST-34).
If this happens, the sender SHOULD NOT send new data (SHLD-15), but
SHOULD retransmit normally the old unacknowledged data between
SND.UNA and SND.UNA+SND.WND (SHLD-16). The sender MAY also
retransmit old data beyond SND.UNA+SND.WND (MAY-7), but SHOULD NOT
time out the connection if data beyond the right window edge is not
acknowledged (SHLD-17). If the window shrinks to zero, the TCP
implementation MUST probe it in the standard way (described below)
(MUST-35).
I.e. we should be treating this case like a regular zero-window. In terms of a fix, we can maybe have RTO handling only adjust cwnd when sent bytes are in-window.
Steps to reproduce
This has some excessive sudos leftover from when I was testing XDP:
Description
//test/benchmarks/tcp:tcp_benchmark
can run with netstack as the iperf client, server, or neither (native). As the server (and with host GRO/GSO enabled) I see throughput similar to native. As the client, I regularly see the same pattern: a few seconds of throughput at parity with Linux followed by a complete cratering of throughput:Looking at the logs, this looks to be triggered by netstack's inability to handle shrinking receive windows. In the pcap
shrinkingWindowMini.pcap.zip (shrunk down to only the relevant packets to keep the file size manageable), you can see two things. First, there are several instances of "normal" full receive buffers / zero windows. These are the reason for the graph's flat shape: transfer is limited by rwnd, not cwnd.
Second, at the end of the capture is the sequence of packets that corresponds to the massive throughput drop in the graph. There are two notable bits here:
Note that the
[TCP Window Full]
packet has sequence number 319710246 and length 1920, indicating that this fills the receive window. But the[TCP ZeroWindow]
packet has the same sequence number, meaning that the 1920 sent bytes are out of window. Thus netstack considers this an RTO and drops the cwnd all the way to 1 segment, causing the slowdown.But per RFC 9293 3.8.6, netstack shouldn't consider those bytes relevant to an RTO:
I.e. we should be treating this case like a regular zero-window. In terms of a fix, we can maybe have RTO handling only adjust cwnd when sent bytes are in-window.
Steps to reproduce
This has some excessive
sudo
s leftover from when I was testing XDP:runsc version
docker version (if using docker)
repo state (if built from source)
release-20240415.0-18-g4810afc36
runsc debug logs (if available)
No response
The text was updated successfully, but these errors were encountered: