Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling upgrades for the control plane #3540

Open
nicolaiort-datev opened this issue Jan 23, 2025 · 8 comments
Open

Rolling upgrades for the control plane #3540

nicolaiort-datev opened this issue Jan 23, 2025 · 8 comments
Labels
customer-request kind/feature Categorizes issue or PR as related to a new feature. lifecycle/blocked Denotes an issue or PR is blocked. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management.

Comments

@nicolaiort-datev
Copy link

nicolaiort-datev commented Jan 23, 2025

Description of the feature you would like to add / User story

As a kkp administrators utilizing KubeOne for our master and seed clusters
I would like to upgrade the master nodes via automated rolling upgrades
in order to allow a stable and reproducable transition to new baseimages and kubernetes versions. The rolling upgrade process should be immutable and change one master node after another.

Solution details

The basic idea follows the rolling, immutable upgrade mechanism used for the worker-node MachineDeployments.

Requirements:

  • Rolling upgrades for (master) node baseimages
  • Rolling upgrades for kubernetes versions (control plane) including etcd
  • Master nodes should be treated immutable as well (no in-place upgrade!)

Alternative approaches

The current approach for baseimage upgrades involves manually upgrading/recreating each master -one by one - via terraform and running kubeone apply after the terraform apply to add the new node to the cluster.
This approach technically works but is not automated, leading to a high probability regarding manual mistakes. It also does not follow the add-first rolling upgrade philosophy (maxBurst: X, maxUnavailable: 0).

Use cases

The main use case is automated upgrades of kkp master and seed clusters to reduce the potential in human error and enable automatic staging across environments.

Our secondary use case for kubeone is the deployment of clusters that are not directly related to kkp but are used in our bootstrapping/disaster recovery. The ability to manage those clusters via our automation with an immutable and reproducible result would improve the reliability of these processes.

Additional information

Potential workflow for OS-Upgrade: The OS is immutable, therefor a new vm/node has to be created from a fresh base-image to switch to a new OS-Version (e.g. Ubuntu 20.04 to 22.04).

@nicolaiort-datev nicolaiort-datev added kind/feature Categorizes issue or PR as related to a new feature. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. labels Jan 23, 2025
@kron4eg kron4eg added the lifecycle/blocked Denotes an issue or PR is blocked. label Jan 23, 2025
@kron4eg
Copy link
Member

kron4eg commented Jan 23, 2025

Hi!

Unfortunatelly due to technical limitations of machine-controller and stateful nature of etcd on control-plane nodes this is not possible currently.

@thomasrootdv
Copy link

Hi @kron4eg

What changes would be necessary to implement this feature? Is ist possible at all?

@kron4eg
Copy link
Member

kron4eg commented Jan 24, 2025

Is ist possible at all

Nope, our machine-controller is only capable to manage worker nodes. Control-plane nodes are managed differently (in KKP it's pods within namespace, in kubeone it's VMs managed over ssh).

@nicolaiort-datev
Copy link
Author

Thanks you for the quick response and feedback.

Unfortunatelly due to technical limitations of machine-controller and stateful nature of etcd on control-plane nodes this is not possible currently.

I could've worded my initial issue description a bit better: My idea was not necessarily using the machine-controller to implement some kind of rolling update mechanism. My initial ideas were:
-Kubeone-CLI: Implement (somehow, but I haven't started my deep-dive into the cli's code) the ability to update the kubernetes version (and associated) services on one node (and then the next and so on - or just simple targeting)

  • Machine-Controller: Allow 1-Node Setups that provision additional control-plane nodes via machine-controller (and kill the initial node afterwards)
  • Docs: Provide some examples on control-plane/static-worker updates in the documentation (afaik it only mentions the possibility without any step-by-step examples).

@toschneck
Copy link
Member

/label customer-request

@toschneck
Copy link
Member

As disucessed I will take this with me and see what options we have to find a solution. For sure not a small feature request, but I think it would a great feature for kubeone.

@kron4eg
Copy link
Member

kron4eg commented Jan 27, 2025

For this to happen, KubeOne would need to "own" the process of VM creation. For now this heavy lifting is done by the terraform. @toschneck to make it happen, this is not just a "not a small feature", this is huge.

@toschneck
Copy link
Member

I know, but couldn't KubeOne controllt the terraform process and execute the rolling vm creation through this? Not sure if there is a terraform go binding as well, but first we only collect it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-request kind/feature Categorizes issue or PR as related to a new feature. lifecycle/blocked Denotes an issue or PR is blocked. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management.
Projects
None yet
Development

No branches or pull requests

5 participants