Replies: 1 comment
-
You can try doing But the best way is to submit an issue with full logs and a way to reproduce. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I cannot upgrade Talos (currently on 1.8.2) because the cluster is not shutting down all pods properly, and Talos is reverting the upgrade.
The upgrade command is
talosctl upgrade --nodes 10.0.50.1 --image factory.talos.dev/installer/01afe9cdcc0d4f3c7de8b551795019845eed0eafcf87aa2dd264af999aabc9a0:v1.9.2 --preserve --timeout=2h0m0s
.Upon issuing the command, Talos tries to drain the node and gets stuck at the Ceph provisioners and pods with PVs because they are throwing errors about not being able to reach the Ceph cluster. This makes sense because the Ceph cluster drains pretty fast, and it seems this causes the other pods to not be able to be terminated.
At the end of the drain, I have pods stuck in running or terminating that need PVs, Ceph provisioners still running, and the Nvidia drivers running. I can manually force terminate the pods needing PVs; however, the other pods are DaemonSets, unaffected by the draining, and keep running.
This, in turn, leads to the cluster not draining entirely and Talos reverting the upgrade.
Has anyone else run into such issues, or does anyone know how to get this working?
Beta Was this translation helpful? Give feedback.
All reactions