Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talos k8s with AWS CCM support #173

Merged
merged 1 commit into from
Feb 17, 2025
Merged

Conversation

erikvveen
Copy link
Contributor

No description provided.

Copy link
Collaborator

@PhilipSchmid PhilipSchmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! I commented on a few things to address. Once done, please also run terraform fmt, make docs, and squash all commits into a single one with a meaningful message.

Copy link
Collaborator

@PhilipSchmid PhilipSchmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated version! I commented on a few nits (mostly with ready-to-apply suggestions). As soon as those are addressed, we can merge it.

Copy link
Collaborator

@PhilipSchmid PhilipSchmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this change without enabling your newly introduced variables? The example doesn't start anymore, because the CP nodes don't have the node-role.kubernetes.io/control-plane= label set anymore. This is a problem for the subsequent Cilium TF module because it waits for all CP nodes to be available before the installation.

With your changes:

$ kgn --show-labels
NAME                  STATUS     ROLES    AGE     VERSION   LABELS
i-0238e1f62a0343992   NotReady   <none>   4m24s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=i-0238e1f62a0343992,kubernetes.io/os=linux
i-07316ad6229166609   NotReady   <none>   4m25s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=i-07316ad6229166609,kubernetes.io/os=linux
i-0cf8f2aeaa9e65467   NotReady   <none>   4m25s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=i-0cf8f2aeaa9e65467,kubernetes.io/os=linux
i-0d9c6cf2b08719873   NotReady   <none>   4m24s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=i-0d9c6cf2b08719873,kubernetes.io/os=linux
i-0f6e0c1f46443a8f4   NotReady   <none>   4m26s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=i-0f6e0c1f46443a8f4,kubernetes.io/os=linux

Before your changes:

$ kgn --show-labels
NAME              STATUS   ROLES           AGE    VERSION   LABELS
ip-10-0-100-197   Ready    control-plane   102s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-100-197,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
ip-10-0-100-199   Ready    <none>          104s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-100-199,kubernetes.io/os=linux
ip-10-0-101-164   Ready    control-plane   102s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-101-164,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
ip-10-0-101-28    Ready    <none>          104s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-101-28,kubernetes.io/os=linux
ip-10-0-102-180   Ready    control-plane   102s   v1.31.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-102-180,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=

@erikvveen
Copy link
Contributor Author

erikvveen commented Feb 10, 2025 via email

@PhilipSchmid
Copy link
Collaborator

@erikvveen I've refactored a few things (see 3889481), and it now works on my end - would you please test it as well and confirm?

I took a few things from:

... and changed the var.metadata_options default values back to the AWS TF provider defaults as it seems the AWS Cloud Controller struggles with http_tokens=required.

Once you confirm, I'll squash everything, run the CI stuff, and merge it eventually.

@erikvveen
Copy link
Contributor Author

erikvveen commented Feb 17, 2025 via email

@PhilipSchmid
Copy link
Collaborator

@erikvveen What exactly do you mean by nodenames following a mixed pattern? When issuing kubectl get nodes all the nodes follow the same IP-based naming pattern:

$ kubectl get nodes
NAME                                          STATUS   ROLES           AGE   VERSION
ip-10-0-100-145.eu-north-1.compute.internal   Ready    control-plane   86s   v1.31.4
ip-10-0-101-23.eu-north-1.compute.internal    Ready    control-plane   85s   v1.31.4
ip-10-0-101-245.eu-north-1.compute.internal   Ready    <none>          87s   v1.31.4
ip-10-0-102-121.eu-north-1.compute.internal   Ready    <none>          87s   v1.31.4
ip-10-0-102-75.eu-north-1.compute.internal    Ready    control-plane   86s   v1.31.4

Yes, it's not directly following the in the docs mentioned ip-xxx-xxx-xxx-xxx.ec2.<region>.internal pattern, but it looks like ip-xxx-xxx-xxx-xxx.<region>.compute.internal is fine as well:

$ kubectl get pods -n kube-system -l k8s-app=aws-cloud-controller-manager
NAME                                 READY   STATUS    RESTARTS   AGE
aws-cloud-controller-manager-7pfsh   1/1     Running   0          3m14s
aws-cloud-controller-manager-f2njz   1/1     Running   0          3m14s
aws-cloud-controller-manager-hhpxv   1/1     Running   0          3m14s
kubectl logs -n kube-system aws-cloud-controller-manager-7pfsh                                                    185ms  13:46:10
I0217 12:43:08.447721       1 serving.go:348] Generated self-signed cert in-memory
I0217 12:43:08.744423       1 serving.go:348] Generated self-signed cert in-memory
W0217 12:43:08.744542       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0217 12:43:09.310760       1 requestheader_controller.go:244] Loaded a new request header values for RequestHeaderAuthRequestController
I0217 12:43:09.312180       1 aws.go:681] Loading region from metadata service
I0217 12:43:12.469956       1 aws.go:1341] Building AWS cloudprovider
I0217 12:43:12.470022       1 aws.go:681] Loading region from metadata service
I0217 12:43:25.281692       1 tags.go:77] AWS cloud filtering on ClusterID: talos-cute
I0217 12:43:25.281774       1 aws.go:1431] The following IP families will be added to nodes: [ipv4]
I0217 12:43:25.281808       1 controllermanager.go:167] Version: v1.27.1
I0217 12:43:25.291378       1 tlsconfig.go:200] "Loaded serving cert" certName="Generated self signed cert" certDetail="\"localhost@1739796188\" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer=\"localhost-ca@1739796188\" (2025-02-17 11:43:08 +0000 UTC to 2026-02-17 11:43:08 +0000 UTC (now=2025-02-17 12:43:25.291328146 +0000 UTC))"
I0217 12:43:25.291814       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0217 12:43:25.292110       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0217 12:43:25.291835       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0217 12:43:25.292324       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0217 12:43:25.291860       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0217 12:43:25.292794       1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
I0217 12:43:25.294518       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1739796189\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1739796189\" (2025-02-17 11:43:08 +0000 UTC to 2026-02-17 11:43:08 +0000 UTC (now=2025-02-17 12:43:25.294479324 +0000 UTC))"
I0217 12:43:25.294592       1 secure_serving.go:210] Serving securely on [::]:10258
I0217 12:43:25.295126       1 leaderelection.go:245] attempting to acquire leader lease kube-system/cloud-controller-manager...
I0217 12:43:25.295666       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0217 12:43:25.392679       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0217 12:43:25.392804       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0217 12:43:25.392842       1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0217 12:43:25.393076       1 tlsconfig.go:178] "Loaded client CA" index=0 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"\" [serving,client] groups=[kubernetes] issuer=\"<self>\" (2025-02-17 12:39:53 +0000 UTC to 2035-02-15 12:39:53 +0000 UTC (now=2025-02-17 12:43:25.393018897 +0000 UTC))"
I0217 12:43:25.393995       1 tlsconfig.go:200] "Loaded serving cert" certName="Generated self signed cert" certDetail="\"localhost@1739796188\" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer=\"localhost-ca@1739796188\" (2025-02-17 11:43:08 +0000 UTC to 2026-02-17 11:43:08 +0000 UTC (now=2025-02-17 12:43:25.393964012 +0000 UTC))"
I0217 12:43:25.394906       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1739796189\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1739796189\" (2025-02-17 11:43:08 +0000 UTC to 2026-02-17 11:43:08 +0000 UTC (now=2025-02-17 12:43:25.394877431 +0000 UTC))"
I0217 12:43:25.395176       1 tlsconfig.go:178] "Loaded client CA" index=0 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"\" [serving,client] groups=[kubernetes] issuer=\"<self>\" (2025-02-17 12:39:53 +0000 UTC to 2035-02-15 12:39:53 +0000 UTC (now=2025-02-17 12:43:25.395144595 +0000 UTC))"
I0217 12:43:25.395356       1 tlsconfig.go:178] "Loaded client CA" index=1 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"\" [serving,client] issuer=\"<self>\" (2025-02-17 12:39:53 +0000 UTC to 2035-02-15 12:39:53 +0000 UTC (now=2025-02-17 12:43:25.395314115 +0000 UTC))"
I0217 12:43:25.396124       1 tlsconfig.go:200] "Loaded serving cert" certName="Generated self signed cert" certDetail="\"localhost@1739796188\" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer=\"localhost-ca@1739796188\" (2025-02-17 11:43:08 +0000 UTC to 2026-02-17 11:43:08 +0000 UTC (now=2025-02-17 12:43:25.396098799 +0000 UTC))"
I0217 12:43:25.396915       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1739796189\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1739796189\" (2025-02-17 11:43:08 +0000 UTC to 2026-02-17 11:43:08 +0000 UTC (now=2025-02-17 12:43:25.396886485 +0000 UTC))"

Hence, the Node name conventions prerequisite seems to be fulfilled. The EC2 instance name on the AWS portal doesn't matter, as far as I can tell.

@erikvveen
Copy link
Contributor Author

erikvveen commented Feb 17, 2025 via email

@PhilipSchmid PhilipSchmid merged commit dee9e31 into isovalent:main Feb 17, 2025
2 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants