Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Cloud-Init network-config file in 'nocloud' format #51

Open
ricottatosta opened this issue Dec 28, 2023 · 37 comments
Open

Support for Cloud-Init network-config file in 'nocloud' format #51

ricottatosta opened this issue Dec 28, 2023 · 37 comments

Comments

@ricottatosta
Copy link

What steps did you take and what happened:
Sorry I submit this as a bug, but maybe it isn't.
When deploying a cluster with Talos as provider for bootstrap and controlplane, Talos' init process finds a cloud-init drive, but then complains about network-config file. Talos' error says: "network-config metadata version=0 is not supported", maybe because it starts with "network:". Is it supported? Reading the manual, it shouldn't.

cloud-init manual

What did you expect to happen:
Maybe cluster deployment should generate a network-config file starting without a top level "network:".

Environment:

  • Cluster-api-provider-proxmox version: 0.1.0
  • Kubernetes version: (use kubectl version): 1.28.3
  • OS (e.g. from /etc/os-release): Talos 1.5.5
@ricottatosta ricottatosta added the kind/bug Something isn't working label Dec 28, 2023
@mcbenjemaa mcbenjemaa added kind/support and removed kind/bug Something isn't working labels Dec 29, 2023
@mcbenjemaa
Copy link
Member

mcbenjemaa commented Dec 29, 2023

@ricottatosta

As I can see, you're trying to use Talos as a control-plane and bootstrap provider.
It's recommended that you submit this issue in talos-control-plane-provider

@ricottatosta
Copy link
Author

Thanks for changing my submission with type 'support'.
The issue involves cidata.iso created by capmox to pass configuration via cloud-init, and talos vm boot image that uses it at boot time.
Or am I supposed to believe that Talos bootstrap and controlplane are involved in creating cidata.iso?
Maybe the issue should be submitted in "talos os".

@mcbenjemaa
Copy link
Member

mcbenjemaa commented Dec 29, 2023

Hmmm,
CAPMOX Now only creates cidata supported by Debian and Ubuntu distributions.
If it's related to supporting Talos Cloud-init, you're right.

I don't know how Talos load the cloud-init iso,

If you know something, I can help you add support for Talos.

@ricottatosta
Copy link
Author

As far as I know, Talos expects cloud-init datasource to be in "nocloud" format. In this format, network-config file can't start with "network:".
Because of the presence of "network:", Talos doesn't find the top level key "version: 2" and assumes it to be "version: 0".
I can do nothing, that's the way you build cidata.iso and depends on how cloud-init expects data to be arranged in distributions like Debian and Ubuntu. You should find a way, either arranging data in a way that is good for all situations or let the user choose what kind of datasource format to produce in cidata.iso.
I know this means more effort. But I think Talos and CAPMOX are great together.

@ricottatosta
Copy link
Author

A couple of usefull links:
Talos NoCloud
Cloud-Init NoCloud datasource

@mcbenjemaa
Copy link
Member

Will check this out

@ricottatosta
Copy link
Author

This is the console output of Talos NoCloud booting image regarding the issue:

[talos] found config disk (cidata) at /dev/sr0
ISO 9660 Extensions: IEEE_P1282
[talos] fetching meta config from: cidata/meta-data
[talos] fetching network config from: cidata/network-config
[talos] fetching machine config from: cidata/user-data
[talos] restarting platform network config {"component": "controller-runtime", "controller": "network.PlatformConfigController", ..., "error": "network-config metadata version=0 is not supported"}

@ricottatosta ricottatosta changed the title Possible wrong Coud-Init network-config generated file Possible wrong Cloud-Init network-config generated file Jan 7, 2024
@mcbenjemaa
Copy link
Member

I will check this if it works with other netplans distros, and then I will implement it

@ricottatosta
Copy link
Author

I'm really looking forward to hearing good news from you.

@mcbenjemaa
Copy link
Member

I will test it on ubuntu but can you tested on Talos,

@ricottatosta
Copy link
Author

ricottatosta commented Jan 16, 2024

What am I supposed to test? Release 0.1.1?
Does it contain any changes in the code involved in the issue?

@mcbenjemaa
Copy link
Member

Currently, we support only Netplan-based distros,
We will try to add more support for other network-configs.

If someone wants to take effort and add this, it will be great.

@ricottatosta
Copy link
Author

OK. It was expected.
Unfortunately I'm not good at programming in Go. And the solution I found out is not suitable for being integrated in your code.
As my solution is quite simple (just delete a bunch of characters from a template string), I'll patch your code every time you release a new version.
Thank you for your support.

@mcbenjemaa
Copy link
Member

We will try to support this soon.

@ricottatosta
Copy link
Author

ricottatosta commented Jan 20, 2024

I would like to share my experience about using CAPMOX with TALOS.
This is the template I use in pkg/cloudinit/network.go:

version: 2
ethernets:
{{- range $index, $element := .NetworkConfigData }}
  eth{{ $index }}:
    match:
      macaddress: {{ $element.MacAddress }}
    dhcp4: false
    addresses:
    {{- if $element.IPAddress }}
      - {{ $element.IPAddress }}
    {{- end }}
    {{- if $element.IPV6Address }}
      - {{ $element.IPV6Address }}
    {{- end }}
  {{- if eq $index 0 }}
    {{- if $element.Gateway }}
    gateway4: {{ $element.Gateway }}
    {{- end }}
    {{- if $element.Gateway6 }}
    gateway6: {{ $element.Gateway6 }}
    {{- end }}
    {{- if $element.DNSServers }}
    nameservers:
      addresses:
      {{- range $element.DNSServers }}
        - {{ . }}
      {{- end -}}
    {{- end -}}
  {{- end -}}
{{- end -}}

TALOS dislikes defining static routes in place of gateways and perhaps even defining nameservers for each device.
For the rest, it works like a charm.

@mcbenjemaa
Copy link
Member

Thanks for sharing,
I guess the best solution for this is to support another version of network-config thats actually different from the netplan config.

@mcbenjemaa
Copy link
Member

We have released a new version.

@ricottatosta
Copy link
Author

ricottatosta commented Jan 28, 2024

I'm trying to make nocloud template string work like the netplan one as much as possible. But there is an issue.
If I omit to define a gateway, the injector refuses to build the cloudinit image complaining it wants it.
Is there a way to define only address and netmask for a network device?

@mcbenjemaa
Copy link
Member

Oh, no
The gateway is required

@mcbenjemaa
Copy link
Member

I will create new issue, so we can take this as a feature to support multiple cloud-init network-config.

However, DHCP will be included in the next release, i don't know if that helps you.

@mcbenjemaa
Copy link
Member

Issue is added

#94

@65278
Copy link
Collaborator

65278 commented Apr 24, 2024

I'm trying to make nocloud template string work like the netplan one as much as possible. But there is an issue. If I omit to define a gateway, the injector refuses to build the cloudinit image complaining it wants it. Is there a way to define only address and netmask for a network device?

One workaround I've found is to add an illegal gateway (169.254.255.254 for example). Cluster-api-provider-ipam-in-cluster will accept this, and netplan will ignore it when applying with a warning.
I have not tested this with cloud-init network config v1.

@isZumpo
Copy link
Contributor

isZumpo commented May 16, 2024

I have been trying to get the talos bootstrap provider and proxmox infrastructure provider together the whole morning. Running into the network-config metadata version=0 is not supported issue... So glad to finally find this github issue about it, meaning that I probably did nothing wrong in my configuration :)

Has there been any progress on the topic since January? Could possibly some sort of variable be added that makes it generate the cidata in the nocloud format such that it is compatible with talos?

@wikkyk
Copy link
Collaborator

wikkyk commented May 31, 2024

Unfortunately not. We don't use Talos and as such we can't commit to adding support for it. Patches welcome, of course, and we are more than happy to accept additional maintainers :-)

@ricottatosta
Copy link
Author

ricottatosta commented May 31, 2024

CAPMOX works well with TALOS. But it needs some patches to the CAPMOX's code.
Furthermore, it is possible to make your cluster "elastic", even self-managed (without an external cluster that creates and manages it).
All it needs to do is modifying network.go file located at pkg/cloudinit/ in the source code and rebuild docker image.
If something ready to use is needed, there is a docker image at https://hub.docker.com/r/ricottatosta/cluster-api-provider-proxmox already patched for TALOS. After deploying CAPMOX, patch CAPMOX deployment manifest and let its container image point at ricottatosta/cluster-api-provider-proxmox:[tag].
Last version (0.3.0) has the following patch applied:

...
const (
	/* network-config template. */
	networkConfigTPl = `version: 2
renderer: networkd
ethernets:
{{- range $index, $element := .NetworkConfigData }}
  eth{{ $index }}:
    match:
      macaddress: {{ $element.MacAddress }}
    dhcp4: {{ if $element.DHCP4 }}true{{ else }}false{{ end }}
    dhcp6: {{ if $element.DHCP6 }}true{{ else }}false{{ end }}
  {{- if or (and (not $element.DHCP4) $element.IPAddress) (and (not $element.DHCP6) $element.IPV6Address) }}
    addresses:
    {{- if $element.IPAddress }}
      - {{ $element.IPAddress }}
    {{- end }}
    {{- if $element.IPV6Address }}
      - '{{ $element.IPV6Address }}'
    {{- end }}
  {{- if eq $index 0 }}
    {{- if and $element.Gateway (not $element.DHCP4) }}
    gateway4: {{ $element.Gateway }}
    {{- end }}
    {{- if and $element.Gateway6 (not $element.DHCP6) }}
    gateway6: '{{ $element.Gateway6 }}'
    {{- end }}
    {{- if $element.DNSServers }}
    nameservers:
      addresses:
      {{- range $element.DNSServers }}
        - {{ . }}
      {{- end -}}
    {{- end -}}
  {{- end -}}
  {{- end -}}
{{- end -}}
{{- $vrf := 0 -}}
{{- range $index, $element := .NetworkConfigData }}
{{- if eq $element.Type "vrf" }}
{{- if eq $vrf 0 }}
vrfs:
{{- $vrf := 1 }}
{{- end }}
  {{$element.Name}}:
    table: {{ $element.Table }}
    {{- if $element.Routes }}{{ template "routes" $element }}{{- end -}}
    {{- if $element.FIBRules }}{{ template "rules" $element }}{{- end -}}
    {{- if $element.Interfaces }}
    interfaces:
    {{- range $element.Interfaces }}
      - {{ . }}
    {{- end -}}
    {{- end -}}
{{- end -}}
{{- end -}}
{{- define "rules" }}
    routing-policy:
    {{- range $index, $rule := .FIBRules }}
      - {
      {{- if $rule.To }} "to": "{{$rule.To}}", {{ end -}}
      {{- if $rule.From }} "from": "{{$rule.From}}", {{ end -}}
      {{- if $rule.Priority }} "priority": {{$rule.Priority}}, {{ end -}}
      {{- if $rule.Table }} "table": {{$rule.Table}}, {{ end -}} }
    {{- end }}
{{- end -}}
{{- define "routes" }}
    routes:
    {{- range $index, $route := .Routes }}
      - {
      {{- if $route.To }} "to": "{{$route.To}}", {{ end -}}
      {{- if $route.Via }} "via": "{{$route.Via}}", {{ end -}}
      {{- if $route.Metric }} "metric": {{$route.Metric}}, {{ end -}}
      {{- if $route.Table }} "table": {{$route.Table}}, {{ end -}} }
    {{- end }}
{{- end -}}
`
)
...

It's not tested against vrf. My use case is two ethernets, public and private.
As mentioned earlier, it works like a charm.

@isZumpo
Copy link
Contributor

isZumpo commented Jun 3, 2024

CAPMOX works well with TALOS. But it needs some patches to the CAPMOX's code. Furthermore, it is possible to make your cluster "elastic", even self-managed (without an external cluster that creates and manages it). All it needs to do is modifying network.go file located at pkg/cloudinit/ in the source code and rebuild docker image. If something ready to use is needed, there is a docker image at https://hub.docker.com/r/ricottatosta/cluster-api-provider-proxmox already patched for TALOS. After deploying CAPMOX, patch CAPMOX deployment manifest and let its container image point at ricottatosta/cluster-api-provider-proxmox:[tag]. Last version (0.3.0) has the following patch applied:

...
const (
	/* network-config template. */
	networkConfigTPl = `version: 2
renderer: networkd
ethernets:
{{- range $index, $element := .NetworkConfigData }}
  eth{{ $index }}:
    match:
      macaddress: {{ $element.MacAddress }}
    dhcp4: {{ if $element.DHCP4 }}true{{ else }}false{{ end }}
    dhcp6: {{ if $element.DHCP6 }}true{{ else }}false{{ end }}
  {{- if or (and (not $element.DHCP4) $element.IPAddress) (and (not $element.DHCP6) $element.IPV6Address) }}
    addresses:
    {{- if $element.IPAddress }}
      - {{ $element.IPAddress }}
    {{- end }}
    {{- if $element.IPV6Address }}
      - '{{ $element.IPV6Address }}'
    {{- end }}
  {{- if eq $index 0 }}
    {{- if and $element.Gateway (not $element.DHCP4) }}
    gateway4: {{ $element.Gateway }}
    {{- end }}
    {{- if and $element.Gateway6 (not $element.DHCP6) }}
    gateway6: '{{ $element.Gateway6 }}'
    {{- end }}
    {{- if $element.DNSServers }}
    nameservers:
      addresses:
      {{- range $element.DNSServers }}
        - {{ . }}
      {{- end -}}
    {{- end -}}
  {{- end -}}
  {{- end -}}
{{- end -}}
{{- $vrf := 0 -}}
{{- range $index, $element := .NetworkConfigData }}
{{- if eq $element.Type "vrf" }}
{{- if eq $vrf 0 }}
vrfs:
{{- $vrf := 1 }}
{{- end }}
  {{$element.Name}}:
    table: {{ $element.Table }}
    {{- if $element.Routes }}{{ template "routes" $element }}{{- end -}}
    {{- if $element.FIBRules }}{{ template "rules" $element }}{{- end -}}
    {{- if $element.Interfaces }}
    interfaces:
    {{- range $element.Interfaces }}
      - {{ . }}
    {{- end -}}
    {{- end -}}
{{- end -}}
{{- end -}}
{{- define "rules" }}
    routing-policy:
    {{- range $index, $rule := .FIBRules }}
      - {
      {{- if $rule.To }} "to": "{{$rule.To}}", {{ end -}}
      {{- if $rule.From }} "from": "{{$rule.From}}", {{ end -}}
      {{- if $rule.Priority }} "priority": {{$rule.Priority}}, {{ end -}}
      {{- if $rule.Table }} "table": {{$rule.Table}}, {{ end -}} }
    {{- end }}
{{- end -}}
{{- define "routes" }}
    routes:
    {{- range $index, $route := .Routes }}
      - {
      {{- if $route.To }} "to": "{{$route.To}}", {{ end -}}
      {{- if $route.Via }} "via": "{{$route.Via}}", {{ end -}}
      {{- if $route.Metric }} "metric": {{$route.Metric}}, {{ end -}}
      {{- if $route.Table }} "table": {{$route.Table}}, {{ end -}} }
    {{- end }}
{{- end -}}
`
)
...

It's not tested against vrf. My use case is two ethernets, public and private. As mentioned earlier, it works like a charm.

Thanks! This patch appears to be working. I am now getting past the point where it was complaining that: "network-config metadata version=0 is not supported" :)

On a related point, how are you setting up the initial network for the control plane using Talos and Proxmox? I am attempting to use the VIP solution built into Talos but it seems to not be working... If you don't mind it would be very nice to see an example of your TalosControlPlane and ProxmoxCluster objects

@ricottatosta
Copy link
Author

ricottatosta commented Jun 3, 2024

For networking I use Cilium without kube-proxy.
For VIP I use kube-vip in BGP mode as daemonset.
Following is what you asked for:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxCluster
metadata:
  name: k8s-test
  namespace: k8s-test
spec:
  allowedNodes:
    - pve1
    - pve2
    - pve3
  controlPlaneEndpoint:
    host: 10.100.150.150 (my VIP)
    port: 6443
  dnsServers:
    - 10.100.150.1
  ipv4Config:
    addresses:
      - 10.100.150.151-10.100.150.159
    gateway: 10.100.150.254
    prefix: 24


apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
metadata:
  name: k8s-test-control-plane
  namespace: k8s-test
spec:
  controlPlaneConfig:
    controlplane:
      generateType: controlplane
      talosVersion: v1.6.1
      configPatches:
        - op: add
          path: /machine/network/extraHostEntries
          value:
            - ip: 127.0.0.1
              aliases:
                - kubernetes
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
    kind: ProxmoxMachineTemplate
    name: k8s-test-control-plane
  replicas: 1
  version: 1.28.3

@wikkyk
Copy link
Collaborator

wikkyk commented Jun 10, 2024

@ricottatosta Thank you for this - can you submit it as a PR, please?

@ricottatosta ricottatosta changed the title Possible wrong Cloud-Init network-config generated file Support for Cloud-Init network-config file in nocloud format Nov 25, 2024
@ricottatosta ricottatosta changed the title Support for Cloud-Init network-config file in nocloud format Support for Cloud-Init network-config file in 'nocloud' format Nov 25, 2024
@ricottatosta
Copy link
Author

After commenting PR #290, here is a recap of the issue.
Talos expects network-config to be in cloud-init nocloud format. It needs a parameter in proxmoxmachine spec such that it can be possible to build the requested format during cidata.iso creation.
I have prepared cluster-template-talos.yaml as a baseline for further templating.

@ricottatosta
Copy link
Author

After reading the relative issue on Talos I guess there is no need for further discussion. Talos is going to support cloud-init format, isn't it? thanks to @ekarlso.

@ricottatosta
Copy link
Author

Better. Talos 1.8.1 already supports it! Thank you all!

@isZumpo
Copy link
Contributor

isZumpo commented Nov 25, 2024

Better. Talos 1.8.1 already supports it! Thank you all!

̶I̶ ̶r̶e̶m̶o̶v̶e̶d̶ ̶y̶o̶u̶r̶ ̶p̶a̶t̶c̶h̶ ̶f̶r̶o̶m̶ ̶m̶y̶ ̶s̶e̶t̶u̶p̶ ̶a̶n̶d̶ ̶r̶e̶i̶n̶i̶t̶i̶a̶l̶i̶z̶e̶d̶ ̶t̶h̶e̶ ̶c̶l̶u̶s̶t̶e̶r̶ ̶w̶i̶t̶h̶ ̶t̶a̶l̶o̶s̶ ̶1̶.̶8̶.̶3̶ ̶a̶n̶d̶ ̶i̶t̶ ̶a̶l̶l̶ ̶j̶u̶s̶t̶ ̶a̶p̶p̶e̶a̶r̶s̶ ̶t̶o̶ ̶w̶o̶r̶k̶.̶ ̶N̶i̶c̶e̶ ̶f̶i̶n̶d̶!̶ ̶:̶+̶1̶:̶ ̶

Never mind, I have some issues related to the network setup. Maybe related to #291, the nodes appear to come up and start (no crash loop like we had before), however, it seems to fail at creating the vip for the control plane:
image this was working before when using the patch. Any ideas what it could be?

@mcbenjemaa
Copy link
Member

@isZumpo try to add the skipCloudInitStatus and skipQemuGuestAgent

@isZumpo
Copy link
Contributor

isZumpo commented Nov 26, 2024

@isZumpo try to add the skipCloudInitStatus and skipQemuGuestAgent

Appears to be working 🥳

For those who run into the same issue, these are the fields which @mcbenjemaa mentions:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxMachineTemplate
metadata:
  name: talos-cluster-worker
spec:
  template:
    spec:
      ...
      checks:
        skipQemuGuestAgent: true
        skipCloudInitStatus: true

However, there is one issue with the above solution, it is not bundled with any recent release of this provider yet. Perhaps time for a new release?

Until then, getting it installed is quite tricky. I am using the cluster-api-operator to declaratively install everything cluster-api related. Had to make a release in my fork of this repo, tweak the release files a bit, and point my InfrastructureProvider towards it like this:

apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: proxmox
 namespace: proxmox-infrastructure-system
spec:
 version: v0.5.2
 fetchConfig:
   url: https://github.com/isZumpo/cluster-api-provider-proxmox/releases/latest/infrastructure-components.yaml

Will make a PR soon which makes the fork release process a bit smoother for those who also use cluster-api-operator.

@glitchcrab
Copy link

@isZumpo do you have any example repos anywhere which show how you deploy Talos clusters using this provider? I'm currently using Sidero on Proxmox but I'd love to use CAPMOX as it would be cleaner - I'm just struggling a little to get my head around which components I need.

@mcbenjemaa
Copy link
Member

@glitchcrab check this repo; it has some instructions, but it's not complete.
https://github.com/ekarlso/talos-on-proxmox-with-cluster-api/

I hope @ekarlso can finish it 😅

@rouke-broersma
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants