You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here’s a more precise description of how to reproduce the issue:
Remove calico-node daemonset(Felix)
Remove /var/lib/calico/mtu file on worker node
deployment of test pods on worker node
Since there is no CNI, it is not deployed and is waiting in ContainerCreating state
Deploy calico-node daemonset(Felix)
Check the status of pods that were mass deployed - some have MTU set to 1500
Here are the expected related codes for the issue.
In the case where the problem occurred, the error below should have occurred in the MTUFromFile function, but it is not confirmed in the node or felix log.
logrus.Infof("File %s does not exist", filename)
The error check only checks whether the file exists,
but the internal MTU value does not exist, so the problem is expected to have occurred.
I guess the reason for that is that some pods are created after calico-node (Felix part) starts up and does all the detection while some are started right after CNI is available and the file is not written yet. Idk why the log is not printed though if there is no file yet 🤔
@tomastigera
Currently only validating the existence of the file, but I think a logic to verify whether the file contains content should also be added.
There is a problem with the MTU settings of the calico veth NIC (ex : calif4fad8d7ab0).
The calico specs are as follows:
Test host NIC environment is as follows:
Also found the following in felix log
felix/int_dataplane.go 1086: Determined pod MTU mtu=1400
The MTU of the veth NIC assigned to the Pod should be calculated as follows and set to 1400
However, with some probability, 1500 is allocated to a specific veth0 interface, such as the calif4fad8d7ab0 interface.
Expected Behavior
All calico veth interfaces must have their MTU set to 1400
Current Behavior
There is a calico veth NIC that is randomly set to 1500.
Possible Solution
Pod delete and deploy or restart nodes
Steps to Reproduce (for bugs)
Context
k8s cluster networking error
Your Environment
The text was updated successfully, but these errors were encountered: