Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server nodes behind NAT, pod networking is broken #10011

Open
NandoTheessen opened this issue Apr 23, 2024 · 5 comments
Open

Server nodes behind NAT, pod networking is broken #10011

NandoTheessen opened this issue Apr 23, 2024 · 5 comments

Comments

@NandoTheessen
Copy link

Environmental Info:
K3s Version:

k3s version v1.29.3+k3s1 (8aecc26)
go version go1.21.8

Node(s) CPU architecture, OS, and Version:
Arm 64
Amd 64
Ubuntu 22.04

Cluster Configuration:
3 servers, software defined networking behind NAT with public IP
3 public agents

flannel backend wireguard-native, ports are allowed in NAT and nodes
servers are tagged with external IP flag which is set to the NATs IP

Describe the bug:
Multiple:

Pods from the agent nodes can't reach pods on the server nodes.
Can't get logs from server nodes due to :

➜ ~ kubectl -n kube-system logs metallb-speaker-gnc5j Error from server: Get "https://<public-ip-nat>:10250/containerLogs/kube-system/metallb-speaker-gnc5j/metallb-speaker": proxy error from 127.0.0.1:6443 while dialing <public-ip-nat>:10250, code 502: 502 Bad Gateway

Steps To Reproduce:

  • Installed K3s:
    curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server" sh -s - --flannel-backend wireguard-native --token <token> --disable servicelb --write-kubeconfig-mode 644 --node-external-ip <public-ip-nat> --flannel-external-ip --disable traefik
    Installed a helm chart for metallb, and a daemonset for speakers is deployed

Expected behavior:
The pods are able to communicate with each other, I'm able to get logs from all pods.

Actual behavior:
As described in "bug", pods can't speak to each other and I can't get logs from pods on the server nodes

Additional context / logs:
Which logs would help? Happy to supply whatever is needed

@brandond
Copy link
Contributor

brandond commented Apr 23, 2024

Did you set --node-ip and --node-external-ip to the correct values for each of the agents, or just the servers?

Based on the information you shared, it sounds like the apiserver is trying to connect to the kubelet's external IP to get logs. Normally it would connect to the public IP using the agent tunnel, so I suspect that the internal and external IPs are not being set properly.

@NandoTheessen
Copy link
Author

NandoTheessen commented Apr 24, 2024

Thanks for the help Brandon!

For the servers, node-ip defaults to the private IP (I believed) which would be 192.x.x..
The agents don't have these set as they only have public IP addresses which are used as the nodes IP addresses.

Should I set node-ip and node-external-ip specifically to their public addresses?

This is the current setup

Server node 1: node-ip not set, node-external-ip set to NAT gateway
Server node 2: node-ip not set, node-external-ip set to NAT gateway
Server node 3: node-ip not set, node-external-ip set to NAT gateway

Agent 1: node-ip, node-external-ip not set but only public iP
Agent 2: node-ip, node-external-ip not set but only public iP
Agent 3: node-ip, node-external-ip not set but only public iP

@tdtgit
Copy link

tdtgit commented Apr 25, 2024

Related to #7355 I think. Base on the comment #7355 (comment), I'm still unable to get it working right.

@NandoTheessen
Copy link
Author

NandoTheessen commented Apr 26, 2024

Thanks for linking that issue @tdtgit !
I don't think it is the same, but it helped me identify the issue a little bit better.
I'm not entirely sure if what I'm trying to achieve is even possible mind you, concretely this is where I'm doubtful:

Since I only have one NAT gateway, I only have one public IP address. So this is my server config:

MY_EXTERNAL_IP=80.xxx.xxx.xxx
server 1: --node-ip 192.168.88.2 --node-external-ip ${MY_EXTERNAL_IP} --flannel-external-ip
server 2:  --node-ip <internal-ip> --node-external-ip ${MY_EXTERNAL_IP} --flannel-external-ip
server 3:  --node-ip <internal-ip> --node-external-ip ${MY_EXTERNAL_IP} --flannel-external-ip

My agent config:

agent 1: --node-ip 80.xxx.xxx.xxx --node-external-ip 80.xxx.xxx.xxx --server https://80.xxx.xxx.xxx:6443
agent 2: --node-ip 80.xxx.xxx.xxx --node-external-ip 80.xxx.xxx.xxx --server https://80.xxx.xxx.xxx:6443
agent 3: --node-ip 80.xxx.xxx.xxx --node-external-ip 80.xxx.xxx.xxx --server https://80.xxx.xxx.xxx:6443

Here is some additional information:

  • Pods deployed on server1 - server 3 are not able to contact services (f.e. the kubernetes API)
  • When running wg show I get this output (server 3):
interface: flannel-wg
  public key: xxxx
  private key: (hidden)
  listening port: 51820

peer: xxxx
  endpoint: 80.xxx.xxx.xxx:51820
  allowed ips: 10.42.3.0/24
  latest handshake: 41 seconds ago
  transfer: 1.96 KiB received, 764 B sent
  persistent keepalive: every 25 seconds

peer: xxxx
  endpoint: 80.xxx.xxx.xxx:51820
  allowed ips: 10.42.4.0/24
  latest handshake: 1 minute, 27 seconds ago
  transfer: 1.23 KiB received, 1.27 KiB sent
  persistent keepalive: every 25 seconds

peer: xxxx
  endpoint: 80.xxx.xxx.xxx:51820
  allowed ips: 10.42.5.0/24
  latest handshake: 1 minute, 38 seconds ago
  transfer: 556.30 KiB received, 282.04 KiB sent
  persistent keepalive: every 25 seconds

peer: xxxx
  endpoint: 80.xxx.xxx.xxx:51820
  allowed ips: 10.42.0.0/24
  transfer: 0 B received, 17.63 KiB sent
  persistent keepalive: every 25 seconds

What I can see from this is, is that server three has only peered with one other server instead of two!
We're missing a peer here and I assume that is related to the NAT gateway that forwards all traffic to one server (server 1).

@NandoTheessen
Copy link
Author

I've indeed managed to fix this by assigning public IPs to all of my servers.
I have one last issue that persists though.

I can read the logs from all pods except the ones on server2 & server3, there I receive a "502: bad gateway" error.
I'm sure this was spotted before in the wild, could you give me some pointers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Triage
Development

No branches or pull requests

3 participants