You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
self-registering CustomMachine worker nodes, backed by a per-az ASG deployment in AWS.
3x control-plane + etcd CustomMachine nodes, backed by a per-az ASG deployment in AWS. (what we are wanting to remove)
3x control-plane + etcd Amazonec2Machine nodes for a per-az deployment, managed by Rancher.
Context:
We currently have a custom setup of both control-plane/etcd nodes and worker nodes backed by AWS ASG's. As part of our companies policy for security & patching upgrades we need to frequently rollout new AMI's. This approach works extremely well for our "worker" nodes, where we have configured the AWS ASG with an instance TTL of 3 days.
When it comes to the control-plane/etcd nodes its slightly more problematic. This is because of the destruction of the "init-node", and Ranchers inability to re-designate a new init-node when the previous one was deleted.
To mitigate this we have moved to using Amazonec2Machine managed node pools for our control-plane/etcd nodes, where Rancher maintains the lifecycle of those nodes and can gracefully re-assign an existing control-plane/etcd node to be the new init-node.
How to migrate?
Currently for a large portion of our clusters the init-node is currently assigned to a CustomMachine control-plane/etcd node that is managed by AWS ASG, and we want to move it to an Amazonec2Machine instance managed by Rancher.
This is the current approach that we have validated, and are hoping to perform for all clusters:
Identify the "machine plan" secret resource name for the Rancher managed control-plane/etcd instance that we want to be the new init-node.
2.1 Use the resource name .spec.bootstrap.configRef.name from the machine.cluster.x-k8s.io resource.
Add the rke.cattle.io/machine-id label to the machine plan secret resource from 2.1
At this point, Rancher automatically starts updating all nodes and reconfiguring them to connect to the init-node that we have defined.
The reason why we are setting the rke.cattle.io/machine-id label on the machine plan is because its used within the following function to select/filter eligible nodes that are suitable for being made an init-node.
We noticed that for CustomMachine nodes the label rke.cattle.io/machine-id is set, whereas for Amazonec2Machine nodes the label is absent. Is this expected, or a bug?
Follow up, by setting the rke.cattle.io/machine-id label on the Amazonec2Machine nodes, are we potentially breaking some other functionality?
Is the process I've outlined above suitable for forcing a new node to be an "init-node"? There seems to be a lack of operational tooling to handle such a use-case.
The text was updated successfully, but these errors were encountered:
Environmental Info:
RKE2 Version:
v1.27.6+rke2r1
Rancher Version:
v2.8.3
Cluster Configuration:
CustomMachine
worker nodes, backed by a per-az ASG deployment in AWS.CustomMachine
nodes, backed by a per-az ASG deployment in AWS. (what we are wanting to remove)Amazonec2Machine
nodes for a per-az deployment, managed by Rancher.Context:
We currently have a custom setup of both control-plane/etcd nodes and worker nodes backed by AWS ASG's. As part of our companies policy for security & patching upgrades we need to frequently rollout new AMI's. This approach works extremely well for our "worker" nodes, where we have configured the AWS ASG with an instance TTL of 3 days.
When it comes to the control-plane/etcd nodes its slightly more problematic. This is because of the destruction of the "init-node", and Ranchers inability to re-designate a new init-node when the previous one was deleted.
To mitigate this we have moved to using
Amazonec2Machine
managed node pools for our control-plane/etcd nodes, where Rancher maintains the lifecycle of those nodes and can gracefully re-assign an existing control-plane/etcd node to be the new init-node.How to migrate?
Currently for a large portion of our clusters the init-node is currently assigned to a
CustomMachine
control-plane/etcd node that is managed by AWS ASG, and we want to move it to anAmazonec2Machine
instance managed by Rancher.This is the current approach that we have validated, and are hoping to perform for all clusters:
cattle-id
from the instanceIdentify the "machine plan" secret resource name for the Rancher managed control-plane/etcd instance that we want to be the new init-node.
2.1 Use the resource name
.spec.bootstrap.configRef.name
from themachine.cluster.x-k8s.io
resource.Add the
rke.cattle.io/machine-id
label to the machine plan secret resource from 2.1clusters.provisioning.cattle.io
resource withrke.cattle.io/init-node-machine-id
label.At this point, Rancher automatically starts updating all nodes and reconfiguring them to connect to the init-node that we have defined.
The reason why we are setting the
rke.cattle.io/machine-id
label on the machine plan is because its used within the following function to select/filter eligible nodes that are suitable for being made an init-node.https://github.com/rancher/rancher/blob/release/v2.8/pkg/capr/planner/initnode.go#L48-L54
Questions
CustomMachine
nodes the labelrke.cattle.io/machine-id
is set, whereas forAmazonec2Machine
nodes the label is absent. Is this expected, or a bug?rke.cattle.io/machine-id
label on theAmazonec2Machine
nodes, are we potentially breaking some other functionality?The text was updated successfully, but these errors were encountered: