Performance problem: EKS control plane overwhelmed when applying large number of workloads with MeshServices enabled #12723
Labels
area/kuma-cp
area/performance
kind/bug
A bug
triage/accepted
The issue was reviewed and is complete enough to start working on it
Milestone
Warning
This issue is work in progress and describes current knowledge about this problem
Kuma Version
any
Describe the bug
Summary
When applying a large number of services using
MeshServices
in performance tests, the control plane becomes overwhelmed at around 600-700 services. This leads to excessive logging, issues with syncing deployments, and eventually,etcd
failures that cause Kubernetes control plane components to restart.Observed issues
Frequent deployment sync errors in
kube-controller-manager
These errors appear repeatedly for many deployments.
Endpoint slice errors before
etcd
becomes unstableThese errors happen across multiple services, causing delays and resource contention.
Complete failure of
etcd
connectionsetcd
.To Reproduce
fake-service
(e.g., 1000 services).kube-controller-manager
,endpointslice_controller
) log multiple retries and failures.etcd
fail completely, leading to control plane restarts.Expected behavior
No response
Additional context (optional)
logs-insights-results-perf-tests-debugging-250131.json
Issues started around 09:50:




The text was updated successfully, but these errors were encountered: