Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubespray upgrade failed because etcd-event.service cannot start #11143

Open
bognarbalazs opened this issue Apr 30, 2024 · 0 comments · May be fixed by #11144
Open

Kubespray upgrade failed because etcd-event.service cannot start #11143

bognarbalazs opened this issue Apr 30, 2024 · 0 comments · May be fixed by #11144
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@bognarbalazs
Copy link

bognarbalazs commented Apr 30, 2024

What happened?

I run the cluster.yml playbook to add new contorl_plane member to cluster(old one is deleted and add a new with a newer OS version, but same IP ) . During the play it failled with the following:
Apr 3 10:34:44 k8smafunp001 etcd[13852]: {"level":"info","ts":"2024-04-03T10:34:44.399+0200","caller":"embed/etcd.go:306","msg":"starting an etcd server","etcd-version":"3.5.6","git-sha":"cecbe35ce","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":false,"name":"etcd3-events","data-dir":"/var/lib/etcd-events","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd-events/member","force-new-cluster":false,"heartbeat-interval":"250ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://x.x.x.x:2382"],"listen-peer-urls":["https://x.x.x.x:2382"],"advertise-client-urls":["https://x.x.x.x:2383"],"listen-client-urls":["https://x.x.x.x:2383","https://127.0.0.1:2383"],"listen-metrics-urls":[],"cors":[""],"host-whitelist":[""],"initial-cluster":"etcd1-events=https://x.x.x.x:2382,etcd2-events=https://x.x.x.x:2382,etcd3-events=https://x.x.x.x:2382","initial-cluster-state":"existing","initial-cluster-token":"k8s_events_etcd","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"8h0m0s","auto-compaction-interval":"8h0m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
Apr 3 10:34:44 k8smafunp001 etcd[13852]: {"level":"info","ts":"2024-04-03T10:34:44.399+0200","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/etcd-events/member/snap/db","took":"288.751µs"}
Apr 3 10:34:44 k8smafunp001 etcd[13852]: {"level":"info","ts":"2024-04-03T10:34:44.413+0200","caller":"embed/etcd.go:373","msg":"closing etcd server","name":"etcd3-events","data-dir":"/var/lib/etcd-events","advertise-peer-urls":["https://x.x.x.x:2382"],"advertise-client-urls":["https://x.x.x.x:2383"]}
Apr 3 10:34:44 k8smafunp001 etcd[13852]: {"level":"info","ts":"2024-04-03T10:34:44.413+0200","caller":"embed/etcd.go:375","msg":"closed etcd server","name":"etcd3-events","data-dir":"/var/lib/etcd-events","advertise-peer-urls":["https://x.x.x.x:2382"],"advertise-client-urls":["https://x.x.x.x:2383"]}
Apr 3 10:34:44 k8smafunp001 etcd[13852]: {"level":"fatal","ts":"2024-04-03T10:34:44.413+0200","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"error validating peerURLs {ClusterID:30135f3f9dc06005 Members:[&{ID:4eb24a8393e2c04b RaftAttributes:{PeerURLs:[https://x.x.x.x:2382] IsLearner:false} Attributes:{Name:etcd1-events ClientURLs:[https://x.x.x.x:2383]}} &{ID:8f3f7b987b892448 RaftAttributes:{PeerURLs:[https://x.x.x.x:2382] IsLearner:false} Attributes:{Name:etcd2-events ClientURLs:[https://x.x.x.x:2383]}}] RemovedMemberIDs:[]}: member count is unequal","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:32\nruntime.main\n\truntime/proc.go:225"}
Apr 3 10:34:44 k8smafunp001 systemd[1]: etcd-events.service: Main process exited, code=exited, status=1/FAILURE
Apr 3 10:34:44 k8smafunp001 systemd[1]: etcd-events.service: Failed with result 'exit-code'.
Apr 3 10:34:44 k8smafunp001 systemd[1]: Failed to start etcd.

What did you expect to happen?

Cannot start etcd-events.service and cannot add master node to cluster.

How can we reproduce it (as minimally and precisely as possible)?

Simply remove kube_control_plane node from cluster and try to re-add it.

OS

Ubuntu 22.04.4 LTS

Version of Ansible

2.12

Version of Python

3.9

Version of Kubespray (commit)

2.22

Network plugin used

custom_cni

Full inventory with variables

Command used to invoke ansible

run from awx

Output of ansible run

{
"msg": "Unable to start service etcd-events: Job for etcd-events.service failed because the control process exited with error code.\nSee "systemctl status etcd-events.service" and "journalctl -xeu etcd-events.service" for details.\n",
"invocation": {
"module_args": {
"name": "etcd-events",
"state": "started",
"enabled": true,
"daemon_reload": false,
"daemon_reexec": false,
"scope": "system",
"no_block": false,
"force": null,
"masked": null
}
},
"_ansible_no_log": false,
"changed": false
}

Anything else we need to know

No response

@bognarbalazs bognarbalazs added the kind/bug Categorizes issue or PR as related to a bug. label Apr 30, 2024
@bognarbalazs bognarbalazs changed the title Kubespray upgrade failed because etcd-event cannot start Kubespray upgrade failed because etcd-event.service cannot start Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant