Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After HTTP fault executed finish, the pod cannot be accessed. "Received a netlink error message Network is unreachable (os error 101)" #4369

Open
Kurtcobainzl opened this issue Mar 14, 2024 · 2 comments
Assignees

Comments

@Kurtcobainzl
Copy link

Kurtcobainzl commented Mar 14, 2024

Bug Report

What version of Kubernetes are you using?

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.22.3-aliyun.1
WARNING: version difference between client (1.29) and server (1.22) exceeds the supported minor version skew of +/-1

by the way,My CNI function is provided by Alibaba Cloud's Terway.

What version of Chaos Mesh are you using?

Controller manager Version: version.Info{GitVersion:"v2.6.2-dev-gdb6d384d41ca27", GitCommit:"db6d384d41ca27713262bf35215b52e087dc3f6d", BuildDate:"2024-03-08T14:14:35Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

What did you do? / Minimal Reproducible Example

I tried to execute an http fault,The configuration information is as follows

kind: HTTPChaos apiVersion: chaos-mesh.org/v1alpha1 metadata: namespace: chaos-a-okex name: test-kurt-0314-1946 annotations: experiment.chaos-mesh.org/pause: 'true' spec: selector: namespaces: - chaos-a-okex labelSelectors: app/name: ****-*****-**** mode: all target: Request delay: 1000ms port: 6998 path: '*' duration: 5m
When I executed it, the symptom of the failure was as expected,But when I stopped the experiment, I found that the faulty injected pod was no longer accessible。

and this is the detail :

  1. Before injection
    image (1)

  2. Fault injection complete
    image

  3. Stop the fault injection
    step3
    you can find: When the fault is rectified, the route information is unavailable。

error log:
2024-03-14T19:49:20.951+0800 INFO chaos-daemon.daemon-server chaosdaemon/server.go:187 applying http chaos {"namespacedName": "chaos-a-okex/***-5559c874b4-hgjpz"} 2024-03-14T19:49:20.951+0800 INFO chaos-daemon.daemon-server chaosdaemon/server.go:187 ApplyHttpChaosin uid ok {"namespacedName": "chaos-a-okex/***-5559c874b4-hgjpz"} 2024-03-14T19:49:20.951+0800 INFO chaos-daemon.daemon-server chaosdaemon/server.go:187 applying http chaos-if in.InstanceUid ==ni{"namespacedName": "chaos-a-okex/***-5559c874b4-hgjpz", "in": "rules:\"[]\" container_id:\"containerd://307891188f2eb64d66517a623020cd548a511b78d30990c41726fcacf39e1127\" instance:4092977 startTime:1710416830000 enterNS:true instance_uid:\"97cee125-e8f4-4cfe-86ef-92f923e8684e\""} 2024-03-14T19:49:20.951+0800 INFO chaos-daemon.daemon-server pb/chaosdaemon.pb.go:4415 applyHttpChaos-the length of actions {"namespacedName": "chaos-a-okex/***-5559c874b4-hgjpz", "length": 0} 2024-03-14T19:49:20.952+0800 INFO chaos-daemon.daemon-server pb/chaosdaemon.pb.go:4415 ready to apply {"namespacedName": "chaos-a-okex/***-5559c874b4-hgjpz", "config": "{\"rules\":[]}"} 2024-03-14T19:49:20.952+0800 INFO chaos-daemon.daemon-server pb/chaosdaemon.pb.go:4415 applyHttpChaos {"namespacedName": "chaos-a-okex/***-5559c874b4-hgjpz", "reqError": "json: unsupported type: func() (io.ReadCloser, error)"} 2024-03-14T11:49:20.997120Z INFO chaos_tproxy::proxy::exec: Proxy executor killing sub process 2024-03-14T11:49:20.997229Z ERROR handle{proxy=Proxy { opt: ProxyOpt { ipc_path: "/tmp/fc73155d-e075-4fec-a360-0dac7ba255e1.sock", verbose: 2 }, net_env: NetEnv { netns: "fa0ee8cf-3fffns", device: "eth0", ip: "10.254.123.145/32", bridge1: "fa0ee8cf-3fffb1", bridge2: "fa0ee8cf-3fffb2", veth1: "fa0ee8cf-3fffv1", veth2: "veth0", veth3: "veth1", veth4: "fa0ee8cf-3fffv4", save_routes: [RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([169, 254, 1, 1]), Oif(2)] }] }, rtnl_handle: Handle(ConnectionHandle { requests_tx: UnboundedSender(Some(UnboundedSenderInner { inner: UnboundedInner { state: 9223372036854775808, message_queue: Queue { head: 0x5597daf69710, tail: UnsafeCell { .. } }, num_senders: 1, recv_task: AtomicWaker } })) }), sender: Some(Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) }), rx: None, task: Some(JoinHandle { id: Id(10) }) } request=Request { method: PUT, uri: /, version: HTTP/1.1, headers: {"host": "", "user-agent": "Go-http-client/1.1", "content-length": "12"}, body: Body(Streaming) }}: chaos_tproxy::proxy::net::routes: can not recover ROUTE MSG: RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([169, 254, 1, 1]), Oif(2)] }, error: Received a netlink error message Network is unreachable (os error 101) 2024-03-14T11:49:20.997274Z ERROR handle{proxy=Proxy { opt: ProxyOpt { ipc_path: "/tmp/fc73155d-e075-4fec-a360-0dac7ba255e1.sock", verbose: 2 }, net_env: NetEnv { netns: "fa0ee8cf-3fffns", device: "eth0", ip: "10.254.123.145/32", bridge1: "fa0ee8cf-3fffb1", bridge2: "fa0ee8cf-3fffb2", veth1: "fa0ee8cf-3fffv1", veth2: "veth0", veth3: "veth1", veth4: "fa0ee8cf-3fffv4", save_routes: [RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([169, 254, 1, 1]), Oif(2)] }] }, rtnl_handle: Handle(ConnectionHandle { requests_tx: UnboundedSender(Some(UnboundedSenderInner { inner: UnboundedInner { state: 9223372036854775808, message_queue: Queue { head: 0x5597daf69710, tail: UnsafeCell { .. } }, num_senders: 1, recv_task: AtomicWaker } })) }), sender: Some(Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) }), rx: None, task: Some(JoinHandle { id: Id(10) }) } request=Request { method: PUT, uri: /, version: HTTP/1.1, headers: {"host": "", "user-agent": "Go-http-client/1.1", "content-length": "12"}, body: Body(Streaming) }}: chaos_tproxy::proxy::net::bridge: Local IP address not found 2024-03-14T11:49:20.997286Z ERROR handle{proxy=Proxy { opt: ProxyOpt { ipc_path: "/tmp/fc73155d-e075-4fec-a360-0dac7ba255e1.sock", verbose: 2 }, net_env: NetEnv { netns: "fa0ee8cf-3fffns", device: "eth0", ip: "10.254.123.145/32", bridge1: "fa0ee8cf-3fffb1", bridge2: "fa0ee8cf-3fffb2", veth1: "fa0ee8cf-3fffv1", veth2: "veth0", veth3: "veth1", veth4: "fa0ee8cf-3fffv4", save_routes: [RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([169, 254, 1, 1]), Oif(2)] }] }, rtnl_handle: Handle(ConnectionHandle { requests_tx: UnboundedSender(Some(UnboundedSenderInner { inner: UnboundedInner { state: 9223372036854775808, message_queue: Queue { head: 0x5597daf69710, tail: UnsafeCell { .. } }, num_senders: 1, recv_task: AtomicWaker } })) }), sender: Some(Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) }), rx: None, task: Some(JoinHandle { id: Id(10) }) } request=Request { method: PUT, uri: /, version: HTTP/1.1, headers: {"host": "", "user-agent": "Go-http-client/1.1", "content-length": "12"}, body: Body(Streaming) }}: chaos_tproxy::proxy::net::bridge: Local IP address not found 2024-03-14T11:49:20.997293Z ERROR handle{proxy=Proxy { opt: ProxyOpt { ipc_path: "/tmp/fc73155d-e075-4fec-a360-0dac7ba255e1.sock", verbose: 2 }, net_env: NetEnv { netns: "fa0ee8cf-3fffns", device: "eth0", ip: "10.254.123.145/32", bridge1: "fa0ee8cf-3fffb1", bridge2: "fa0ee8cf-3fffb2", veth1: "fa0ee8cf-3fffv1", veth2: "veth0", veth3: "veth1", veth4: "fa0ee8cf-3fffv4", save_routes: [RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([169, 254, 1, 1]), Oif(2)] }] }, rtnl_handle: Handle(ConnectionHandle { requests_tx: UnboundedSender(Some(UnboundedSenderInner { inner: UnboundedInner { state: 9223372036854775808, message_queue: Queue { head: 0x5597daf69710, tail: UnsafeCell { .. } }, num_senders: 1, recv_task: AtomicWaker } })) }), sender: Some(Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) }), rx: None, task: Some(JoinHandle { id: Id(10) }) } request=Request { method: PUT, uri: /, version: HTTP/1.1, headers: {"host": "", "user-agent": "Go-http-client/1.1", "content-length": "12"}, body: Body(Streaming) }}: chaos_tproxy::proxy::net::bridge: Local IP address not found 2024-03-14T11:49:20.997300Z ERROR handle{proxy=Proxy { opt: ProxyOpt { ipc_path: "/tmp/fc73155d-e075-4fec-a360-0dac7ba255e1.sock", verbose: 2 }, net_env: NetEnv { netns: "fa0ee8cf-3fffns", device: "eth0", ip: "10.254.123.145/32", bridge1: "fa0ee8cf-3fffb1", bridge2: "fa0ee8cf-3fffb2", veth1: "fa0ee8cf-3fffv1", veth2: "veth0", veth3: "veth1", veth4: "fa0ee8cf-3fffv4", save_routes: [RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([169, 254, 1, 1]), Oif(2)] }] }, rtnl_handle: Handle(ConnectionHandle { requests_tx: UnboundedSender(Some(UnboundedSenderInner { inner: UnboundedInner { state: 9223372036854775808, message_queue: Queue { head: 0x5597daf69710, tail: UnsafeCell { .. } }, num_senders: 1, recv_task: AtomicWaker } })) }), sender: Some(Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) }), rx: None, task: Some(JoinHandle { id: Id(10) }) } request=Request { method: PUT, uri: /, version: HTTP/1.1, headers: {"host": "", "user-agent": "Go-http-client/1.1", "content-length": "12"}, body: Body(Streaming) }}: chaos_tproxy::proxy::net::bridge: Local IP address not found 2024-03-14T11:49:20.997307Z ERROR handle{proxy=Proxy { opt: ProxyOpt { ipc_path: "/tmp/fc73155d-e075-4fec-a360-0dac7ba255e1.sock", verbose: 2 }, net_env: NetEnv { netns: "fa0ee8cf-3fffns", device: "eth0", ip: "10.254.123.145/32", bridge1: "fa0ee8cf-3fffb1", bridge2: "fa0ee8cf-3fffb2", veth1: "fa0ee8cf-3fffv1", veth2: "veth0", veth3: "veth1", veth4: "fa0ee8cf-3fffv4", save_routes: [RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([169, 254, 1, 1]), Oif(2)] }] }, rtnl_handle: Handle(ConnectionHandle { requests_tx: UnboundedSender(Some(UnboundedSenderInner { inner: UnboundedInner { state: 9223372036854775808, message_queue: Queue { head: 0x5597daf69710, tail: UnsafeCell { .. } }, num_senders: 1, recv_task: AtomicWaker } })) }), sender: Some(Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) }), rx: None, task: Some(JoinHandle { id: Id(10) }) } request=Request { method: PUT, uri: /, version: HTTP/1.1, headers: {"host": "", "user-agent": "Go-http-client/1.1", "content-length": "12"}, body: Body(Streaming) }}: chaos_tproxy::proxy::net::bridge: Local IP address not found 2024-03-14T19:49:20.997+0800 INFO chaos-daemon.daemon-server pb/chaosdaemon.pb.go:4415 http chaos applied {"namespacedName": "chaos-a-okex/***-5559c874b4-hgjpz"}

screenshot-20240314-201719

What did you expect to see?
the request recovery in the target pod
What did you see instead?

Output of chaosctl

@STRRL STRRL self-assigned this Mar 19, 2024
@STRRL
Copy link
Member

STRRL commented Mar 19, 2024

Hi @Kurtcobainzl , it seems that something goes wrong when recover the HTTPChaos. But I am not sure the root cause.

Could you provide more information like:

  • CNI plugin
  • Linux Distro and Linux kernel version

to help us reproduce and profile this issue?

Thanks!

@Kurtcobainzl
Copy link
Author

Kurtcobainzl commented Mar 21, 2024

thanks, I got some information:
UNI: terway version: 0.3.1

Linux Distro and Linux kernel version:

cat /proc/version
Linux version 5.10.134-15.1.al8.x86_64 (mockbuild@kojid011139182114.na61) (gcc (GCC) 10.2.1 20200825 (Alibaba 10.2.1-3.5 2.32), GNU ld version 2.35-12.2.al8) #1 SMP Wed Aug 16 11:40:37 CST 2023

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

uname -r
5.10.134-15.1.al8.x86_64

uname -a
Linux okcoin-market-service-5559c874b4-xmmlr 5.10.134-15.1.al8.x86_64 #1 SMP Wed Aug 16 11:40:37 CST 2023 x86_64 x86_64 x86_64 GNU/Linux

If you need any more information, you can tell me about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants