Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Unable to do "kubectl logs" for pods running in edge node. #1838

Open
chunfungintel opened this issue Dec 1, 2023 · 26 comments
Open
Labels
kind/question kind/question

Comments

@chunfungintel
Copy link

chunfungintel commented Dec 1, 2023

What happened:
Unable to do "kubectl logs" for pods in edge node.

What you expected to happen:
Success to view logs in edge node.

How to reproduce it (as minimally and precisely as possible):
Control-panel setup:
Kubernetes version:
kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.17", GitCommit:"953be8927218ec8067e1af2641e540238ffd7576", GitTreeState:"clean", BuildDate:"2023-02-22T13:33:14Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

Kubernete initialization:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

OpenYurt installation:
helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.3.4
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.3.4
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent

Edge node:
Installation:
sudo rm which kubelet kubeadm kubectl
wget https://github.com/openyurtio/openyurt/releases/download/v1.3.4/yurtadm-v1.3.4-linux-amd64.zip
unzip yurtadm-v1.3.4-linux-amd64.zip
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm &&
sudo chmod +x /usr/local/bin/yurtadm

Joining:
sudo yurtadm join
${CONTROL_PANEL_ADDRESS}:6443
--token=${JOIN_TOKEN} --node-type=edge
--cri-socket=unix:///run/containerd/containerd.sock
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

kubectl logs -n kube-system raven-agent-ds-r7lbl
Error from server: Get "https://192.168.0.111:10250/containerLogs/kube-system/raven-agent-ds-r7lbl/raven-agent": dial tcp 192.168.0.111:10250: i/o timeout
NAMESPACE      NAME                                               READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-ncq42                              1/1     Running   0          21h
kube-flannel   kube-flannel-ds-pjl2w                              1/1     Running   0          21h
kube-system    coredns-bd6b6df9f-4bwgn                            1/1     Running   0          21h
kube-system    coredns-bd6b6df9f-f9dwj                            1/1     Running   0          21h
kube-system    etcd-adl-control                                   1/1     Running   0          21h
kube-system    kube-apiserver-adl-control                         1/1     Running   0          21h
kube-system    kube-controller-manager-adl-control                1/1     Running   0          21h
kube-system    kube-proxy-4ct9t                                   1/1     Running   0          21h
kube-system    kube-proxy-55vls                                   1/1     Running   0          21h
kube-system    kube-scheduler-adl-control                         1/1     Running   0          21h
kube-system    raven-agent-ds-2hvct                               1/1     Running   0          21h
kube-system    raven-agent-ds-r7lbl                               1/1     Running   0          21h
kube-system    yurt-hub-ubuntu-platform                           1/1     Running   0          21h
kube-system    yurt-manager-7f5bbb5744-fp5m8                      1/1     Running   0          21h

Anything else we need to know?:
Control panel node in subnet 10.226.76.0/23, while edge node in 192.168.0.0/24.
I am able to join and deploy workload, but failed to view its logs.
I am not sure which steps I missed?

Environment:

  • OpenYurt version: 1.3.4
  • Kubernetes version (use kubectl version): 1.23.17
  • OS (e.g: cat /etc/os-release): Ubuntu 22.04.3 LTS
  • Kernel (e.g. uname -a): 6.2.0-37-generic
  • Install tools:
  • Others:

others
/kind question

@chunfungintel chunfungintel added the kind/question kind/question label Dec 1, 2023
@YTGhost
Copy link
Member

YTGhost commented Dec 1, 2023

@chunfungintel Hi, I think you should deploy Raven like this to enable node IP forward:

helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true

After that, you need to create the Gateway CR, see here

@chunfungintel
Copy link
Author

Hi,

Thank you for your suggestion.

I modified my steps as below:

  1. Raven deployment change(Note: Raven image 0.4.0 still N/A):
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=latest --version 0.4.0 
  1. Nodes labelling:
# Edge node
kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge
# Cloud node
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud
  1. Gateway settings:
cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF
  1. Modified Raven agent according to here(https://github.com/openyurtio/raven/blob/main/docs/raven-agent-tutorial.md#install-raven-agent). I can see Raven pods restart after deploy.
make deploy
bash hack/gen-yaml.sh openyurt/raven-agent:latest libreswan false ":8080"
==== create raven-agent.yaml in /home/chunfung/Github/raven/_output/yamls ====
# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
kubectl apply -f _output/yamls/raven-agent.yaml
Warning: resource serviceaccounts/raven-agent-account is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
serviceaccount/raven-agent-account configured
Warning: resource clusterroles/raven-agent-role is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrole.rbac.authorization.k8s.io/raven-agent-role configured
Warning: resource clusterrolebindings/raven-agent-role-binding is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrolebinding.rbac.authorization.k8s.io/raven-agent-role-binding configured
Warning: resource configmaps/raven-agent-config is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
configmap/raven-agent-config configured
Warning: resource secrets/raven-agent-secret is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
secret/raven-agent-secret configured
Warning: resource daemonsets/raven-agent-ds is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
daemonset.apps/raven-agent-ds configured

Unfortunately, I still unable to do 'kubectl logs' on edge node successfully. Any idea yet? :)

@YTGhost
Copy link
Member

YTGhost commented Dec 4, 2023

@chunfungintel I think you should use v0.3.2 instead of v0.4 for raven's image version if you are still deploying v1.3 openyurt

@chunfungintel
Copy link
Author

Hi @YTGhost

Actually, these are the only available versions available in helm

helm search repo raven-agent --versions
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
openyurt/raven-agent    0.4.0           0.4.0           A Helm chart for Kubernetes
openyurt/raven-agent    0.1.1           0.2.0           A Helm chart for Kubernetes
openyurt/raven-agent    0.1.0           0.2.0           A Helm chart for Kubernetes

I do not need to specifically need to use v1.3 OpenYurt, do you have any version that I should try on?

It seems in specific version, Raven controller is merged into yurt-manager(correct me if I am wrong), is that a version before 1.3?

@YTGhost
Copy link
Member

YTGhost commented Dec 4, 2023

I do not need to specifically need to use v1.3 OpenYurt, do you have any version that I should try on?

@chunfungintel raven's previous version of Chart doesn't look like managed very well, I think you can use openyurt v1.4 since v0.4 raven upgraded the CRD. Of course you can also use openyurt v1.3, maybe you have to manually change raven's Chart package. For example, using version 0.1.1 of Chart and manually adjusting the image version of raven-agent to v0.3.2.

It seems in specific version, Raven controller is merged into yurt-manager(correct me if I am wrong), is that a version before 1.3?

We merged raven-controller-manager into yurt-manager in v1.3, so in v1.3 and beyond, you only need to install yurt-manager.

@chunfungintel
Copy link
Author

Revised steps:

Control-panel initialization:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/master-

Using OpenYurt 1.4.0 + Raven agent 0.4.0

helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.4.0 --set image.tag=latest
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.4.0
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=0.4.0 --version 0.4.0

Install OpenYurt 1.4 in Edge

wget https://github.com/openyurtio/openyurt/releases/download/v1.4.0/yurtadm-v1.4.0-linux-amd64.tar.gz
tar -xvf yurtadm-v1.4.0-linux-amd64.tar.gz
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && sudo chmod +x /usr/local/bin/yurtadm

Edge node joining:

sudo yurtadm join \
${CONTROL_PANEL_ADDRESS}:6443 \
--token=${JOIN_TOKEN} --node-type=edge \
--cri-socket=unix:///run/containerd/containerd.sock \
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

Gateway configuration:

kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge; \
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud

cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF

git clone https://github.com/openyurtio/raven.git
cd raven && git checkout v0.4.0
make deploy

Results:
Still unable to do 'kubectl logs'

Anything still missing?

@YTGhost
Copy link
Member

YTGhost commented Dec 8, 2023

Revised steps:

Control-panel initialization:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/master-

Using OpenYurt 1.4.0 + Raven agent 0.4.0

helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.4.0 --set image.tag=latest
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.4.0
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=0.4.0 --version 0.4.0

Install OpenYurt 1.4 in Edge

wget https://github.com/openyurtio/openyurt/releases/download/v1.4.0/yurtadm-v1.4.0-linux-amd64.tar.gz
tar -xvf yurtadm-v1.4.0-linux-amd64.tar.gz
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && sudo chmod +x /usr/local/bin/yurtadm

Edge node joining:

sudo yurtadm join \
${CONTROL_PANEL_ADDRESS}:6443 \
--token=${JOIN_TOKEN} --node-type=edge \
--cri-socket=unix:///run/containerd/containerd.sock \
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

Gateway configuration:

kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge; \
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud

cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF

git clone https://github.com/openyurtio/raven.git
cd raven && git checkout v0.4.0
make deploy

Results: Still unable to do 'kubectl logs'

Anything still missing?

@chunfungintel Hi, could you please provide the logs of raven-agent?

@chunfungintel
Copy link
Author

chunfungintel commented Dec 8, 2023

@YTGhost
This is logs from control panel only:

W1208 03:07:36.262826 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I1208 03:07:36.283129 1 start.go:61] Start raven agent
I1208 03:07:36.283810 1 engine.go:69] RavenEngine: engine successfully start
I1208 03:07:36.385317 1 engine.go:107] "RavenEngine: adding gateway gw-edge"
I1208 03:07:36.385426 1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1208 03:07:36.385472 1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1208 03:07:36.385541 1 engine.go:107] "RavenEngine: adding gateway gw-cloud"
I1208 03:07:36.385567 1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1208 03:07:36.385594 1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
I1208 03:07:36.385897 1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1208 03:07:36.390257 1 tunnel.go:80] RavenEngine: route driver vxlan initialized
I1208 03:07:36.393717 1 libreswan.go:363] starting pluto
Initializing NSS database

I1208 03:07:37.395489 1 libreswan.go:385] start pluto successfully
I1208 03:07:37.395594 1 tunnel.go:89] RavenEngine: VPN driver libreswan initialized
E1208 03:09:07.398446 1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-cloud"
I1208 03:09:07.398573 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
I1208 03:09:07.398598 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud"
I1208 03:09:07.398698 1 tunnelagent.go:113] "applying network" localEndpoint= remoteEndpoint=map[]
I1208 03:09:07.398723 1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
I1208 03:09:07.420646 1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
I1208 03:09:07.467244 1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
E1208 03:10:37.470153 1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-cloud"
I1208 03:10:37.470231 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
I1208 03:10:37.470261 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud"
I1208 03:10:37.470303 1 tunnelagent.go:109] network not changed, skip to process

It seems to me the configuration failed due to I am behind cooperate proxy?

@YTGhost
Copy link
Member

YTGhost commented Dec 8, 2023

@YTGhost This is logs from control panel only:

W1208 03:07:36.262826 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I1208 03:07:36.283129 1 start.go:61] Start raven agent I1208 03:07:36.283810 1 engine.go:69] RavenEngine: engine successfully start I1208 03:07:36.385317 1 engine.go:107] "RavenEngine: adding gateway gw-edge" I1208 03:07:36.385426 1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue I1208 03:07:36.385472 1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue I1208 03:07:36.385541 1 engine.go:107] "RavenEngine: adding gateway gw-cloud" I1208 03:07:36.385567 1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue I1208 03:07:36.385594 1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue I1208 03:07:36.385897 1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge I1208 03:07:36.390257 1 tunnel.go:80] RavenEngine: route driver vxlan initialized I1208 03:07:36.393717 1 libreswan.go:363] starting pluto Initializing NSS database

I1208 03:07:37.395489 1 libreswan.go:385] start pluto successfully I1208 03:07:37.395594 1 tunnel.go:89] RavenEngine: VPN driver libreswan initialized E1208 03:09:07.398446 1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-cloud" I1208 03:09:07.398573 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge" I1208 03:09:07.398598 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud" I1208 03:09:07.398698 1 tunnelagent.go:113] "applying network" localEndpoint= remoteEndpoint=map[] I1208 03:09:07.398723 1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections I1208 03:09:07.420646 1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting I1208 03:09:07.467244 1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud E1208 03:10:37.470153 1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-cloud" I1208 03:10:37.470231 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge" I1208 03:10:37.470261 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud" I1208 03:10:37.470303 1 tunnelagent.go:109] network not changed, skip to process

It seems to me the configuration failed due to I am behind cooperate proxy?

@chunfungintel I think it should be, raven will go to the public network and request to get the PublicIp, however maybe it's because of your network environment, there was a problem with the request process.

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

@YTGhost
Copy link
Member

YTGhost commented Dec 12, 2023

@chunfungintel Hi, has this been resolved or any progress made?

@chunfungintel
Copy link
Author

@YTGhost Actually I was collecting logs when you asking :)

What I do currently is inject http_proxy, https_proxy and no_proxy with

kubectl edit daemonsets.apps -n kube-system raven-agent-ds
        env:
        - name: http_proxy
          value: http://PROXY_NAME:PORT
        - name: https_proxy
          value: http://PROXY_NAME:PORT
        - name: no_proxy
          value: 169.254.2.1/32,10.0.0.0/8,192.168.0.0/16,localhost,.local,127.0.0.0/8,172.16.0.0/12,134.134.0.0/16,10.226.76.0/23,.svc,kube-system.svc,192.168.0.0/24
        - name: HTTP_PROXY
          value: http://PROXY_NAME:PORT
        - name: HTTPS_PROXY
          value: http://PROXY_NAME:PORT
        - name: NO_PROXY
          value: 169.254.2.1/32,10.0.0.0/8,192.168.0.0/16,localhost,.local,127.0.0.0/8,172.16.0.0/12,134.134.0.0/16,10.226.76.0/23,.svc,kube-system.svc,192.168.0.0/24

Raven's logs from control panel:

W1219 05:49:53.701467       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1219 05:49:53.711849       1 start.go:61] Start raven agent
I1219 05:49:53.711976       1 engine.go:69] RavenEngine: engine successfully start
I1219 05:56:08.179137       1 engine.go:107] "RavenEngine: adding gateway gw-edge"
I1219 05:56:08.179159       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1219 05:56:08.179169       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1219 05:56:08.179210       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1219 05:56:08.185324       1 engine.go:107] "RavenEngine: adding gateway gw-cloud"
I1219 05:56:08.185340       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1219 05:56:08.185348       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
I1219 05:56:08.185681       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
I1219 05:56:08.185696       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1219 05:56:08.185703       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1219 05:56:08.185762       1 tunnel.go:80] RavenEngine: route driver vxlan initialized
I1219 05:56:08.186634       1 libreswan.go:363] starting pluto
I1219 05:56:08.191901       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
I1219 05:56:08.191916       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1219 05:56:08.191924       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
Initializing NSS database

I1219 05:56:09.187474       1 libreswan.go:385] start pluto successfully
I1219 05:56:09.187628       1 tunnel.go:89] RavenEngine: VPN driver libreswan initialized
I1219 05:56:11.682578       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
I1219 05:56:11.682643       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1219 05:56:11.682680       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
I1219 05:56:11.684039       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud"
I1219 05:56:11.684187       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
I1219 05:56:11.684374       1 tunnelagent.go:113] "applying network" localEndpoint=<nil> remoteEndpoint=map[]
I1219 05:56:11.684473       1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
I1219 05:56:11.700547       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
I1219 05:56:11.700624       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1219 05:56:11.700666       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
I1219 05:56:11.709070       1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
I1219 05:56:11.746785       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
I1219 05:56:11.746981       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
I1219 05:56:11.747062       1 tunnelagent.go:113] "applying network" localEndpoint="10.226.76.105" remoteEndpoint=map[]
I1219 05:56:11.747082       1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
I1219 05:56:11.762037       1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
I1219 05:56:11.784486       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
I1219 05:56:11.784538       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1219 05:56:11.784566       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1219 05:56:11.797713       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
I1219 05:56:11.797737       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1219 05:56:11.797754       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1219 05:56:11.818785       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1219 05:56:11.819122       1 tunnelagent.go:113] "applying network" localEndpoint="10.226.76.105" remoteEndpoint=map[gw-edge:192.168.0.111]
I1219 05:56:11.822511       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 10.226.76.105-192.168.0.111-10.226.76.105/32-10.244.1.0/24 --id @10.226.76.105-10.226.76.105/32-10.244.1.0/24 --host 10.226.76.105 --client 10.226.76.105/32 --ikeport 4500 --to --id @192.168.0.111-10.244.1.0/24-10.226.76.105/32 --host %any --client 10.244.1.0/24] output="002 \"10.226.76.105-192.168.0.111-10.226.76.105/32-10.244.1.0/24\": added IKEv2 connection\n"
I1219 05:56:11.835984       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 10.226.76.105-192.168.0.111-10.226.76.105/32-192.168.0.111/32 --id @10.226.76.105-10.226.76.105/32-192.168.0.111/32 --host 10.226.76.105 --client 10.226.76.105/32 --ikeport 4500 --to --id @192.168.0.111-192.168.0.111/32-10.226.76.105/32 --host %any --client 192.168.0.111/32] output="002 \"10.226.76.105-192.168.0.111-10.226.76.105/32-192.168.0.111/32\": added IKEv2 connection\n"
I1219 05:56:11.842535       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 10.226.76.105-192.168.0.111-10.244.0.0/24-10.244.1.0/24 --id @10.226.76.105-10.244.0.0/24-10.244.1.0/24 --host 10.226.76.105 --client 10.244.0.0/24 --ikeport 4500 --to --id @192.168.0.111-10.244.1.0/24-10.244.0.0/24 --host %any --client 10.244.1.0/24] output="002 \"10.226.76.105-192.168.0.111-10.244.0.0/24-10.244.1.0/24\": added IKEv2 connection\n"
I1219 05:56:11.849240       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 10.226.76.105-192.168.0.111-10.244.0.0/24-192.168.0.111/32 --id @10.226.76.105-10.244.0.0/24-192.168.0.111/32 --host 10.226.76.105 --client 10.244.0.0/24 --ikeport 4500 --to --id @192.168.0.111-192.168.0.111/32-10.244.0.0/24 --host %any --client 192.168.0.111/32] output="002 \"10.226.76.105-192.168.0.111-10.244.0.0/24-192.168.0.111/32\": added IKEv2 connection\n"
I1219 05:56:11.853424       1 vxlan.go:81] Tunnel: only gateway node exist in current gateway, cleaning up route setting
I1219 05:56:11.910735       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
I1219 05:56:11.911057       1 tunnelagent.go:109] network not changed, skip to process
I1219 05:56:11.911077       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
I1219 05:56:11.911198       1 tunnelagent.go:109] network not changed, skip to process
I1219 05:56:11.911224       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
I1219 05:56:11.911335       1 tunnelagent.go:109] network not changed, skip to process
I1219 05:56:11.911349       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1219 05:56:11.911477       1 tunnelagent.go:109] network not changed, skip to process
I1219 05:56:11.911495       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1219 05:56:11.911600       1 tunnelagent.go:109] network not changed, skip to process

Raven's logs from edge node(grabbed from /var/log/pods/kube-system_raven-agent-ds)

2023-12-19T13:55:18.916883189+08:00 stderr F W1219 05:55:18.916769       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023-12-19T13:55:19.024656536+08:00 stderr F I1219 05:55:19.024571       1 start.go:61] Start raven agent
2023-12-19T13:55:19.024698613+08:00 stderr F I1219 05:55:19.024656       1 engine.go:69] RavenEngine: engine successfully start
2023-12-19T13:56:08.181413498+08:00 stderr F I1219 05:56:08.181226       1 engine.go:107] "RavenEngine: adding gateway gw-edge"
2023-12-19T13:56:08.181434171+08:00 stderr F I1219 05:56:08.181236       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
2023-12-19T13:56:08.181436027+08:00 stderr F I1219 05:56:08.181241       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
2023-12-19T13:56:08.181437719+08:00 stderr F I1219 05:56:08.181271       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
2023-12-19T13:56:08.187248523+08:00 stderr F I1219 05:56:08.187019       1 engine.go:107] "RavenEngine: adding gateway gw-cloud"
2023-12-19T13:56:08.187261786+08:00 stderr F I1219 05:56:08.187029       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
2023-12-19T13:56:08.187263583+08:00 stderr F I1219 05:56:08.187034       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
2023-12-19T13:56:08.18743989+08:00 stderr F I1219 05:56:08.187357       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
2023-12-19T13:56:08.187445895+08:00 stderr F I1219 05:56:08.187368       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
2023-12-19T13:56:08.187447777+08:00 stderr F I1219 05:56:08.187376       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
2023-12-19T13:56:08.193960361+08:00 stderr F I1219 05:56:08.193744       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
2023-12-19T13:56:08.193973497+08:00 stderr F I1219 05:56:08.193752       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
2023-12-19T13:56:08.193975305+08:00 stderr F I1219 05:56:08.193758       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
2023-12-19T13:56:08.200998585+08:00 stderr F I1219 05:56:08.200839       1 tunnel.go:80] RavenEngine: route driver vxlan initialized
2023-12-19T13:56:08.201821126+08:00 stderr F I1219 05:56:08.201771       1 libreswan.go:363] starting pluto
2023-12-19T13:56:08.377037145+08:00 stdout F Initializing NSS database
2023-12-19T13:56:08.377049455+08:00 stdout F
2023-12-19T13:56:09.204237114+08:00 stderr F I1219 05:56:09.204026       1 libreswan.go:385] start pluto successfully
2023-12-19T13:56:09.204253142+08:00 stderr F I1219 05:56:09.204076       1 tunnel.go:89] RavenEngine: VPN driver libreswan initialized
2023-12-19T13:56:11.684339366+08:00 stderr F I1219 05:56:11.684084       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
2023-12-19T13:56:11.684369557+08:00 stderr F I1219 05:56:11.684099       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
2023-12-19T13:56:11.684370606+08:00 stderr F I1219 05:56:11.684108       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
2023-12-19T13:56:11.70151363+08:00 stderr F I1219 05:56:11.701442       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
2023-12-19T13:56:11.701544862+08:00 stderr F I1219 05:56:11.701452       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
2023-12-19T13:56:11.701545796+08:00 stderr F I1219 05:56:11.701466       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
2023-12-19T13:56:11.786530579+08:00 stderr F I1219 05:56:11.786351       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
2023-12-19T13:56:11.786537547+08:00 stderr F I1219 05:56:11.786368       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
2023-12-19T13:56:11.786538571+08:00 stderr F I1219 05:56:11.786378       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
2023-12-19T13:56:11.786548476+08:00 stderr F I1219 05:56:11.786414       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud"
2023-12-19T13:56:11.786549519+08:00 stderr F I1219 05:56:11.786421       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
2023-12-19T13:56:11.786550289+08:00 stderr F I1219 05:56:11.786446       1 tunnelagent.go:113] "applying network" localEndpoint=<nil> remoteEndpoint=map[]
2023-12-19T13:56:11.786551073+08:00 stderr F I1219 05:56:11.786450       1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
2023-12-19T13:56:11.7949514+08:00 stderr F I1219 05:56:11.794852       1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
2023-12-19T13:56:11.799016888+08:00 stderr F I1219 05:56:11.798999       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
2023-12-19T13:56:11.799023241+08:00 stderr F I1219 05:56:11.799011       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
2023-12-19T13:56:11.799025304+08:00 stderr F I1219 05:56:11.799020       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
2023-12-19T13:56:11.851155499+08:00 stderr F I1219 05:56:11.851014       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
2023-12-19T13:56:11.851166507+08:00 stderr F I1219 05:56:11.851054       1 tunnelagent.go:113] "applying network" localEndpoint="192.168.0.111" remoteEndpoint=map[gw-cloud:10.226.76.105]
2023-12-19T13:56:11.854136169+08:00 stderr F I1219 05:56:11.853981       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32 --id @192.168.0.111-10.244.1.0/24-10.226.76.105/32 --host 192.168.0.111 --client 10.244.1.0/24 --to --id @10.226.76.105-10.226.76.105/32-10.244.1.0/24 --host 192.198.146.186 --client 10.226.76.105/32 --ikeport 4500] output="002 \"192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32\": added IKEv2 connection\n"
2023-12-19T13:56:11.867305679+08:00 stderr F I1219 05:56:11.867173       1 libreswan.go:316] "whacking with" args=[--route --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32] output=""
2023-12-19T13:56:11.867637563+08:00 stderr F I1219 05:56:11.867580       1 libreswan.go:316] "whacking with" args=[--initiate --asynchronous --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32] output="181 \"192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32\" #1: initiating IKEv2 connection\n"
2023-12-19T13:56:11.868168601+08:00 stderr F I1219 05:56:11.868097       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24 --id @192.168.0.111-10.244.1.0/24-10.244.0.0/24 --host 192.168.0.111 --client 10.244.1.0/24 --to --id @10.226.76.105-10.244.0.0/24-10.244.1.0/24 --host 192.198.146.186 --client 10.244.0.0/24 --ikeport 4500] output="002 \"192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24\": added IKEv2 connection\n"
2023-12-19T13:56:11.874707837+08:00 stderr F I1219 05:56:11.874657       1 libreswan.go:316] "whacking with" args=[--route --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24] output=""
2023-12-19T13:56:11.875043389+08:00 stderr F I1219 05:56:11.875026       1 libreswan.go:316] "whacking with" args=[--initiate --asynchronous --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24] output="181 \"192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24\" #2: initiating IKEv2 connection\n"
2023-12-19T13:56:11.875626108+08:00 stderr F I1219 05:56:11.875599       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32 --id @192.168.0.111-192.168.0.111/32-10.226.76.105/32 --host 192.168.0.111 --client 192.168.0.111/32 --to --id @10.226.76.105-10.226.76.105/32-192.168.0.111/32 --host 192.198.146.186 --client 10.226.76.105/32 --ikeport 4500] output="002 \"192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32\": added IKEv2 connection\n"
2023-12-19T13:56:11.875853109+08:00 stderr F I1219 05:56:11.875843       1 libreswan.go:316] "whacking with" args=[--route --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32] output=""
2023-12-19T13:56:11.876065435+08:00 stderr F I1219 05:56:11.876056       1 libreswan.go:316] "whacking with" args=[--initiate --asynchronous --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32] output="181 \"192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32\" #3: initiating IKEv2 connection\n"
2023-12-19T13:56:11.876528375+08:00 stderr F I1219 05:56:11.876490       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24 --id @192.168.0.111-192.168.0.111/32-10.244.0.0/24 --host 192.168.0.111 --client 192.168.0.111/32 --to --id @10.226.76.105-10.244.0.0/24-192.168.0.111/32 --host 192.198.146.186 --client 10.244.0.0/24 --ikeport 4500] output="002 \"192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24\": added IKEv2 connection\n"
2023-12-19T13:56:11.876711446+08:00 stderr F I1219 05:56:11.876701       1 libreswan.go:316] "whacking with" args=[--route --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24] output=""
2023-12-19T13:56:11.876973883+08:00 stderr F I1219 05:56:11.876935       1 libreswan.go:316] "whacking with" args=[--initiate --asynchronous --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24] output="181 \"192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24\" #4: initiating IKEv2 connection\n"
2023-12-19T13:56:11.876979496+08:00 stderr F I1219 05:56:11.876947       1 vxlan.go:81] Tunnel: only gateway node exist in current gateway, cleaning up route setting
2023-12-19T13:56:11.939109516+08:00 stderr F I1219 05:56:11.938954       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
2023-12-19T13:56:11.939132288+08:00 stderr F I1219 05:56:11.939043       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939133533+08:00 stderr F I1219 05:56:11.939051       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
2023-12-19T13:56:11.939134365+08:00 stderr F I1219 05:56:11.939089       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939135243+08:00 stderr F I1219 05:56:11.939094       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
2023-12-19T13:56:11.939136024+08:00 stderr F I1219 05:56:11.939122       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939136804+08:00 stderr F I1219 05:56:11.939127       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
2023-12-19T13:56:11.939254413+08:00 stderr F I1219 05:56:11.939165       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939255434+08:00 stderr F I1219 05:56:11.939170       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
2023-12-19T13:56:11.93925623+08:00 stderr F I1219 05:56:11.939197       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939257033+08:00 stderr F I1219 05:56:11.939201       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
2023-12-19T13:56:11.939257895+08:00 stderr F I1219 05:56:11.939229       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:57:03.556793336+08:00 stderr F I1219 05:57:03.556605       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF

what i am thinking with last line of Edge's Raven's last line of log is it is routed to a proxy, what no_proxy not captured.
2023-12-19T13:57:03.556793336+08:00 stderr F I1219 05:57:03.556605 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF

I had other observation too, sharing in other post.

@chunfungintel
Copy link
Author

Another observation I noticed after setting up gateway, is the nodes became "nonready" shortly after

NAME             STATUS     ROLES                  AGE    VERSION
adl-cloud-node   Ready      control-plane,master   127m   v1.23.17
adl-edge-node    NotReady   <none>                 119m   v1.23.17

From YurtHub logs, it failed to connect to the control-panel:
2023-12-19T22:36:59.056936515+08:00 stderr F E1219 14:36:59.056667 1 prober.go:97] failed to probe: backoff ensure lease error: Get "https://10.226.76.105:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/adl-edge-node?timeout=2s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), remote server https://10.226.76.105:6443

@chunfungintel
Copy link
Author

@YTGhost Can I know how to do this?

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

@YTGhost
Copy link
Member

YTGhost commented Dec 28, 2023

@YTGhost Can I know how to do this?

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

@chunfungintel
Hi, Sorry I've been so busy the last couple days, I'll check it out tonight.
@River-sh Can you help with this issue?

@YTGhost
Copy link
Member

YTGhost commented Dec 28, 2023

@YTGhost Can I know how to do this?

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

About How to get PublicIp manually, you can use some public API to get it, for example, https://ifconfig.me/.
After you get the publicIP, you can check the gateway crd, and then you can find the publicIP field.

@River-sh
Copy link
Contributor

@YTGhost Can I know how to do this?

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

Please refer to the document https://openyurt.io/zh/docs/next/user-manuals/network/raven , you can set the field spec.endpoints.publicIP = 129.xxx.xxx.xxx

@chunfungintel
Copy link
Author

Hi @River-sh @YTGhost

This is my testing topology and gateway configuration, please advice.
For the edge node, what PublicIP shall I used?

graph 
B("Control-Panel (adl-cloud-node)")
    B ---|10.226.xx.xx/23| C{Router}
    C ---|192.168.1.100/24| D["Edge (adl-edge-node)"]
    C ---|192.168.1.200/24| E["Edge (adl-edge-node-2)"]
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud
cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1beta1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  exposeType: PublicIP
  proxyConfig:
    Replicas: 1
    proxyHTTPPort: 10255,9445
    proxyHTTPSPort: 10250,9100
  tunnelConfig:
    Replicas: 1
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
      port: 10262
      type: proxy
      publicIP: 10.226.xx.xx
    - nodeName: adl-cloud-node
      underNAT: false
      port: 4500
      type: tunnel
      publicIP: 10.226.xx.xx
EOF
kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge
cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1beta1
kind: Gateway
metadata:
  name: gw-edge
spec:
  proxyConfig:
    Replicas: 1
  tunnelConfig:
    Replicas: 1
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
      port: 4500
      type: tunnel
EOF

Logs from raven in edge node:

2024-01-02T22:33:07.857685277+08:00 stderr F I0102 14:33:07.857475       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
2024-01-02T22:33:07.857689261+08:00 stderr F I0102 14:33:07.857520       1 tunnelagent.go:113] "applying network" localEndpoint=<nil> remoteEndpoint=map[gw-cloud:10.226.76.105 gw-rbf:10.107.249.110]
2024-01-02T22:33:07.857690312+08:00 stderr F I0102 14:33:07.857531       1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
2024-01-02T22:33:07.86175371+08:00 stderr F I0102 14:33:07.861604       1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
2024-01-02T22:33:07.901832865+08:00 stderr F I0102 14:33:07.901692       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-rbf
2024-01-02T22:33:30.581124834+08:00 stderr F I0102 14:33:30.580844       1 engine.go:121] "RavenEngine: updating gateway, gw-rbf"
2024-01-02T22:33:30.581136463+08:00 stderr F I0102 14:33:30.580854       1 engine.go:95] RavenEngine: enqueue gateway gw-rbf to tunnel queue
2024-01-02T22:33:30.58114176+08:00 stderr F I0102 14:33:30.580860       1 engine.go:100] RavenEngine: enqueue gateway gw-rbf to proxy queue
2024-01-02T22:33:32.20392066+08:00 stderr F I0102 14:33:32.203602       1 engine.go:121] "RavenEngine: updating gateway, gw-rbf"
2024-01-02T22:33:32.203930395+08:00 stderr F I0102 14:33:32.203611       1 engine.go:95] RavenEngine: enqueue gateway gw-rbf to tunnel queue
2024-01-02T22:33:32.203931555+08:00 stderr F I0102 14:33:32.203616       1 engine.go:100] RavenEngine: enqueue gateway gw-rbf to proxy queue
2024-01-02T22:34:15.668335876+08:00 stderr F I0102 14:34:15.668216       1 engine.go:121] "RavenEngine: updating gateway, gw-rbf"
2024-01-02T22:34:15.668344997+08:00 stderr F I0102 14:34:15.668226       1 engine.go:95] RavenEngine: enqueue gateway gw-rbf to tunnel queue
2024-01-02T22:34:15.668346022+08:00 stderr F I0102 14:34:15.668232       1 engine.go:100] RavenEngine: enqueue gateway gw-rbf to proxy queue
2024-01-02T22:34:37.903910881+08:00 stderr F E0102 14:34:37.903642       1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-edge"

Obviously, my cooperate network blocking used of stun, checking with pystun3:

pystun3
NAT Type: Blocked
External IP: None
External Port: None
Press any key to continue

@chunfungintel
Copy link
Author

Update:
I managed to get "kubectl logs" working by using configuration as below:

apiVersion: raven.openyurt.io/v1beta1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  exposeType: PublicIP
  endpoints:
  - nodeName: adl-cloud-node
    port: 4500
    type: tunnel
    publicIP: LOCAL_NETWORK_IP
  proxyConfig:
    Replicas: 1
  tunnelConfig:
    Replicas: 1
EOF
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true

AND

set correct proxy settings in raven-agent-ds

kubectl set env -n kube-system daemonset raven-agent-ds http_proxy=${http_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds https_proxy=${https_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds no_proxy=${no_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds HTTP_PROXY=${HTTP_PROXY}
kubectl set env -n kube-system daemonset raven-agent-ds HTTPS_PROXY=${HTTPS_PROXY}
kubectl set env -n kube-system daemonset raven-agent-ds NO_PROXY=${NO_PROXY}

Thanks a lot for yours support!

@River-sh
Copy link
Contributor

Update: I managed to get "kubectl logs" working by using configuration as below:

apiVersion: raven.openyurt.io/v1beta1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  exposeType: PublicIP
  endpoints:
  - nodeName: adl-cloud-node
    port: 4500
    type: tunnel
    publicIP: LOCAL_NETWORK_IP
  proxyConfig:
    Replicas: 1
  tunnelConfig:
    Replicas: 1
EOF
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true

AND

set correct proxy settings in raven-agent-ds

kubectl set env -n kube-system daemonset raven-agent-ds http_proxy=${http_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds https_proxy=${https_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds no_proxy=${no_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds HTTP_PROXY=${HTTP_PROXY}
kubectl set env -n kube-system daemonset raven-agent-ds HTTPS_PROXY=${HTTPS_PROXY}
kubectl set env -n kube-system daemonset raven-agent-ds NO_PROXY=${NO_PROXY}

Thanks a lot for yours support!

You don't need this complicated configuration, you just need to enable Raven's Tunnel mode and configure the correct Gateway CR https://openyurt.io/zh/docs/user-manuals/network/raven/ and yurt-manager will elect activeEndpoints in Gateway.Status.ActiveEndpoints.

You can kubectl get gw gw-cloud -o yaml to verify that the gateway node is elected

@chunfungintel
Copy link
Author

@River-sh Thank you, I will try and let you know.

@qpanpony
Copy link

Troubled by same question several days. Maybe a bit different network environments from @chunfungintel .
Both my control-plane nodes and edge nodes are behind NAT. I'm able to join edge nodes successfully(using cmd: yurtadm join k8s-api-server-PublicIP:Port --token xxxxx --discovery-token-ca-cert-hash xxxxxx --node-type=edge, k8s-api-server-PublicIP:Port mapped to PrivateIP:6443 in cloud). I am able to deploy busybox workload to edge nodes too, but I am not able to do "kubectl exec/logs" for pods running in edge nodes.

How could I configure Gateway CR correctly when both control-plane nodes and edge nodes are behind NAT?

@River-sh
Copy link
Contributor

River-sh commented Apr 24, 2024

Troubled by same question several days. Maybe a bit different network environments from @chunfungintel . Both my control-plane nodes and edge nodes are behind NAT. I'm able to join edge nodes successfully(using cmd: yurtadm join k8s-api-server-PublicIP:Port --token xxxxx --discovery-token-ca-cert-hash xxxxxx --node-type=edge, k8s-api-server-PublicIP:Port mapped to PrivateIP:6443 in cloud). I am able to deploy busybox workload to edge nodes too, but I am not able to do "kubectl exec/logs" for pods running in edge nodes.

How could I configure Gateway CR correctly when both control-plane nodes and edge nodes are behind NAT?

You can choose to expose the gateway node of the control plane on the public network (configure DNAT on the NAT so that the UDP 4500 of this gateway node can be accessed), and the Gateway is set to UnderNAT=false. You can also set underNat = true to test whether NAT traversal is implemented to build a VPN between two gateway nodes. You can let raven-agent enable nat traversal,but not all NATs can be traversed

@qpanpony
Copy link

I used the same revised step except that raven-agent-0.4.1 was used.

Revised steps:

Control-panel initialization:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/master-

Using OpenYurt 1.4.0 + Raven agent 0.4.0

helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.4.0 --set image.tag=latest
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.4.0
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=0.4.0 --version 0.4.0

Install OpenYurt 1.4 in Edge

wget https://github.com/openyurtio/openyurt/releases/download/v1.4.0/yurtadm-v1.4.0-linux-amd64.tar.gz
tar -xvf yurtadm-v1.4.0-linux-amd64.tar.gz
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && sudo chmod +x /usr/local/bin/yurtadm

Edge node joining:

sudo yurtadm join \
${CONTROL_PANEL_ADDRESS}:6443 \
--token=${JOIN_TOKEN} --node-type=edge \
--cri-socket=unix:///run/containerd/containerd.sock \
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

Gateway configuration:

kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge; \
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud

cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF

git clone https://github.com/openyurtio/raven.git
cd raven && git checkout v0.4.0
make deploy

Results: Still unable to do 'kubectl logs'

Anything still missing?

@River-sh
Copy link
Contributor

River-sh commented Apr 24, 2024

I used the same revised step except that raven-agent-0.4.1 was used.

Revised steps:
Control-panel initialization:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/master-

Using OpenYurt 1.4.0 + Raven agent 0.4.0

helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.4.0 --set image.tag=latest
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.4.0
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=0.4.0 --version 0.4.0

Install OpenYurt 1.4 in Edge

wget https://github.com/openyurtio/openyurt/releases/download/v1.4.0/yurtadm-v1.4.0-linux-amd64.tar.gz
tar -xvf yurtadm-v1.4.0-linux-amd64.tar.gz
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && sudo chmod +x /usr/local/bin/yurtadm

Edge node joining:

sudo yurtadm join \
${CONTROL_PANEL_ADDRESS}:6443 \
--token=${JOIN_TOKEN} --node-type=edge \
--cri-socket=unix:///run/containerd/containerd.sock \
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

Gateway configuration:

kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge; \
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud

cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF

git clone https://github.com/openyurtio/raven.git
cd raven && git checkout v0.4.0
make deploy

Results: Still unable to do 'kubectl logs'
Anything still missing?

As you said, your cloud nodes cannot be accessed on the public network, cross-network domain VPNs cannot be established, and can not use kubectl logs/exec

@River-sh
Copy link
Contributor

@qpanpony You can read this document step by step. https://openyurt.io/zh/docs/user-manuals/network/raven

@qpanpony
Copy link

qpanpony commented Jun 4, 2024

Just a feedback. I quitted to use raven-agent component since cross-network domain VPNs cannot be established under my network environment. Have deployed edgemesh which provided the ability to communicate across subnets based on LibP2P tunnel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question kind/question
Projects
None yet
Development

No branches or pull requests

4 participants