Question: Migrate init-node from `CustomMachine` to `Amazonec2Machine` #45426

kgtw · 2024-05-09T06:16:12Z

Environmental Info:
RKE2 Version: v1.27.6+rke2r1
Rancher Version: v2.8.3

Cluster Configuration:

self-registering CustomMachine worker nodes, backed by a per-az ASG deployment in AWS.
3x control-plane + etcd CustomMachine nodes, backed by a per-az ASG deployment in AWS. (what we are wanting to remove)
3x control-plane + etcd Amazonec2Machine nodes for a per-az deployment, managed by Rancher.

Context:
We currently have a custom setup of both control-plane/etcd nodes and worker nodes backed by AWS ASG's. As part of our companies policy for security & patching upgrades we need to frequently rollout new AMI's. This approach works extremely well for our "worker" nodes, where we have configured the AWS ASG with an instance TTL of 3 days.

When it comes to the control-plane/etcd nodes its slightly more problematic. This is because of the destruction of the "init-node", and Ranchers inability to re-designate a new init-node when the previous one was deleted.

To mitigate this we have moved to using Amazonec2Machine managed node pools for our control-plane/etcd nodes, where Rancher maintains the lifecycle of those nodes and can gracefully re-assign an existing control-plane/etcd node to be the new init-node.

How to migrate?

Currently for a large portion of our clusters the init-node is currently assigned to a CustomMachine control-plane/etcd node that is managed by AWS ASG, and we want to move it to an Amazonec2Machine instance managed by Rancher.

This is the current approach that we have validated, and are hoping to perform for all clusters:

Retrieve the cattle-id from the instance

$ cat /etc/rancher/agent/cattle-id
4a8d613dcd212daa87ef31f8964870e9fc10e94b8a506d439c1b8f9c57d6507

Identify the "machine plan" secret resource name for the Rancher managed control-plane/etcd instance that we want to be the new init-node.
2.1 Use the resource name .spec.bootstrap.configRef.name from the machine.cluster.x-k8s.io resource.
Add the rke.cattle.io/machine-id label to the machine plan secret resource from 2.1

$ kubectl label secret -n fleet-default <resource-name> rke.cattle.io/machine-id=4a8d613dcd212daa87ef31f8964870e9fc10e94b8a506d439c1b8f9c57d6507

Update clusters.provisioning.cattle.io resource with rke.cattle.io/init-node-machine-id label.

$ kubectl label clusters.provisioning.cattle.io -n fleet-default <cluster> rke.cattle.io/init-node-machine-id=4a8d613dcd212daa87ef31f8964870e9fc10e94b8a506d439c1b8f9c57d6507

At this point, Rancher automatically starts updating all nodes and reconfiguring them to connect to the init-node that we have defined.

The reason why we are setting the rke.cattle.io/machine-id label on the machine plan is because its used within the following function to select/filter eligible nodes that are suitable for being made an init-node.

https://github.com/rancher/rancher/blob/release/v2.8/pkg/capr/planner/initnode.go#L48-L54

Questions

We noticed that for CustomMachine nodes the label rke.cattle.io/machine-id is set, whereas for Amazonec2Machine nodes the label is absent. Is this expected, or a bug?
Follow up, by setting the rke.cattle.io/machine-id label on the Amazonec2Machine nodes, are we potentially breaking some other functionality?
Is the process I've outlined above suitable for forcing a new node to be an "init-node"? There seems to be a lack of operational tooling to handle such a use-case.

The text was updated successfully, but these errors were encountered:

brandond · 2024-05-09T08:04:01Z

I am moving this to rancher/rancher, as cluster provisioning is not part of RKE2.

brandond transferred this issue from rancher/rke2 May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Migrate init-node from `CustomMachine` to `Amazonec2Machine` #45426

Question: Migrate init-node from `CustomMachine` to `Amazonec2Machine` #45426

kgtw commented May 9, 2024 •

edited

brandond commented May 9, 2024

Question: Migrate init-node from CustomMachine to Amazonec2Machine #45426

Question: Migrate init-node from CustomMachine to Amazonec2Machine #45426

Comments

kgtw commented May 9, 2024 • edited

brandond commented May 9, 2024

Question: Migrate init-node from `CustomMachine` to `Amazonec2Machine` #45426

Question: Migrate init-node from `CustomMachine` to `Amazonec2Machine` #45426

kgtw commented May 9, 2024 •

edited