Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection between Leafnode and Core NATS over satellite link fails to get established #5417

Closed
piotrgalecki opened this issue May 13, 2024 · 0 comments · Fixed by #5424
Closed
Assignees
Labels
defect Suspected defect such as a bug or regression

Comments

@piotrgalecki
Copy link

Observed behavior

Siden is running into an odd issue with NATS Leafnode not being able to connect to our Core NATS server.
After TCP 3-way handshake Core NATS sends INFO message and then NATS Leafnode responds with INFO and 400+ms later sends closes connection (sends TCP FIN).
The latency of connection is 500ms.

Here are packets exchanged. Packets with P flag are INFO messages, followed by FIN packet

22:00:32.440826 IP 10.99.0.99.34786 > 3.136.10.169.7422: Flags [S], seq 1607668652, win 65535, options [mss 1460,sackOK,TS val 2932985413 ecr 0,nop,wscale 8], length 0
22:00:32.980351 IP 3.136.10.169.7422 > 10.99.0.99.34786: Flags [S.], seq 2383013068, ack 1607668653, win 65535, options [mss 1460,sackOK,TS val 1779976004 ecr 2932985413,nop,wscale 8], length 0
22:00:32.980410 IP 10.99.0.99.34786 > 3.136.10.169.7422: Flags [.], ack 1, win 256, options [nop,nop,TS val 2932985953 ecr 1779976004], length 0
22:00:33.520223 IP 3.136.10.169.7422 > 10.99.0.99.34786: Flags [P.], seq 1:468, ack 1, win 256, options [nop,nop,TS val 1779976544 ecr 2932985953], length 467
22:00:33.520276 IP 10.99.0.99.34786 > 3.136.10.169.7422: Flags [.], ack 468, win 261, options [nop,nop,TS val 2932986493 ecr 1779976544], length 0
22:00:33.520719 IP 10.99.0.99.34786 > 3.136.10.169.7422: Flags [P.], seq 1:164, ack 468, win 261, options [nop,nop,TS val 2932986493 ecr 1779976544], length 163
22:00:33.981008 IP 10.99.0.99.34786 > 3.136.10.169.7422: Flags [F.], seq 164, ack 468, win 261, options [nop,nop,TS val 2932986954 ecr 1779976544], length 0

in the console the NATS Leafnode outputs the following:
[7] [INF] 3.13.54.170:7422 - lid:53 - Leafnode connection closed: Read Error - Account: $G
[7] [INF] 3.13.54.170:7422 - lid:54 - Leafnode connection created for account: $G
[7] [INF] 3.13.54.170:7422 - lid:54 - Leafnode connection closed: Read Error - Account: $G

After investigation the issue is caused by too short timeout DEFAULT_LEAFNODE_INFO_WAIT, which is hardcoded to 1s.
Unfortunately it's a constant and not configurable,
We changed the timeout value in the vendor directory to 3s and that resolved the issue.

Please make this parameter configurable.

Expected behavior

NATS connection can be established successfully

Server and client version

v2.10.10

Host environment

No response

Steps to reproduce

Test NATS Leafnode - Server connection over high latency link.

@piotrgalecki piotrgalecki added the defect Suspected defect such as a bug or regression label May 13, 2024
kozlovic added a commit that referenced this issue May 15, 2024
For the solicit side, we were using a constant of 1 second as a
tcp connection deadline waiting to receive the INFO protocol.

I made the change to be the same than for routes and gateways by
using the ping timer to detect a stale connection. In the handshake
process, the ping timer will be used to close the connection as
stale if the timer fires after the amount of time based on the
ping interval and max pings out. If the INFO is received and the
handshake proceeds, the ping timer is then set as usual to perform
the regular PING tasks.

Resolves #5417

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
kozlovic added a commit that referenced this issue May 15, 2024
For the solicit side, we were using a constant of 1 second as a
tcp connection deadline waiting to receive the INFO protocol.

I made the change to be the same than for routes and gateways by
using the ping timer to detect a stale connection. In the handshake
process, the ping timer will be used to close the connection as
stale if the timer fires after the amount of time based on the
ping interval and max pings out. If the INFO is received and the
handshake proceeds, the ping timer is then set as usual to perform
the regular PING tasks.

Resolves #5417

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
kozlovic added a commit that referenced this issue May 15, 2024
Added the leafnode remote configuration parameter `first_info_timeout`
which is the amount of time that a server creating a leafnode
connection will wait for the initial INFO from the remote server.

Resolves #5417

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
derekcollison added a commit that referenced this issue May 16, 2024
Added the leafnode remote configuration parameter `first_info_timeout`
which is the amount of time that a server creating a leafnode
connection will wait for the initial INFO from the remote server.

Resolves #5417

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
neilalexander pushed a commit that referenced this issue May 16, 2024
Added the leafnode remote configuration parameter `first_info_timeout`
which is the amount of time that a server creating a leafnode
connection will wait for the initial INFO from the remote server.

Resolves #5417

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
wallyqs pushed a commit that referenced this issue May 16, 2024
Added the leafnode remote configuration parameter `first_info_timeout`
which is the amount of time that a server creating a leafnode
connection will wait for the initial INFO from the remote server.

Resolves #5417

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants