-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] k8s gateway cluster names #10403
Conversation
@@ -68,10 +66,6 @@ type translatorInstance struct { | |||
shouldEnforceNamespaceMatch bool | |||
} | |||
|
|||
func NewDefaultTranslator(settings *v1.Settings, pluginRegistry plugins.PluginRegistry) *translatorInstance { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess this wasn't used anywhere
Issues linked to changelog: |
Visit the preview URL for this PR (updated for commit 9f6c3ff): https://gloo-edge--pr10403-kubegateway-clustern-c73iax59.web.app (expires Mon, 09 Dec 2024 20:56:24 GMT) 🔥 via Firebase Hosting GitHub Action 🌎 Sign: 77c2b86e287749579b7ff9cadb81e099042ef677 |
} | ||
|
||
// Don't use dots in the name as it messes up prometheus stats | ||
return fmt.Sprintf("%s_%s", ref.GetName(), ref.GetNamespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a limit on the max length? IIRC grafana has something around 128 chars which might be similar for promethus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which Grafana limit you have in mind (it's just the visualization layer backed by a regular database), but in Prometheus one can specify limits to label name/values in bytes. These are all 0 by default.
While no one is a big fan of long metric/label names, Envoy is doing a pretty terrible job on this front already and we aren't really making that worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A max length for the Prometheus metric key. Just want to be on the safe side in case there's a limit and the key gets truncated
@@ -30,7 +54,7 @@ var _ = Describe("Plugin", func() { | |||
}, | |||
} | |||
out := &envoy_config_route_v3.Route{} | |||
err := p.ProcessRoute(plugins.RouteParams{}, in, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we need to test an empty RouteParams ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realistically the RouteParams will never be empty, we have several plugins that do a params.Snapshot.<resourceType>.Find
without doing a nil check. This plugin now does a Find
where it didn't used to, so thats why the test had to change to pass in a non-empty route params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need an upgrade test to prove that this is a zero downtime upgrade OR clear documentation outlining how to safely perform an upgrade from 1.17 to 1.18
description: >- | ||
When using the Kubernetes Gateway API, envoy cluster names corresponding to Kubernetes Service Upstreams | ||
are now in a new more parseable format that allows us to extract info about the Service for metrics: | ||
`upstreamName_upstreamNs_svcNs_svcName_svcPort` (underscore-separated). Note that as long as Kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this up-to-date: upstreamName_upstreamNs_svcNs_svcName_svcPort
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
closing in favor of #10439 which only modifies stats names |
Description
When Kubernetes Gateway integration is enabled (i.e. Gloo Gateway is installed with
kubeGateway.enabled=true
), envoy cluster names corresponding to Kubernetes Services or kube-type Upstreams are now in a new more parseable format that allows us to extract info about the Service for metrics.For a Service destination, the cluster name will be in the format:
kube-svc:upstreamName_upstreamNs_svcNs_svcName_svcPort
For a Kube Upstream destination, the cluster name will be in the format:
kube-upstream:upstreamName_upstreamNs_svcNs_svcName_svcPort
Notes:
upstreamName_upstreamNs
).Background
Currently, the cluster names we use in envoy are constructed from upstreams. There are a few types of upstreams:
discovery
pod (when discovery is enabled) from k8s services and the upstream names follow the formatsvcNs-svcName-svcPort
kube-svc:svcNs-svcName-svcPort
In all of the above cases, the cluster name was previously constructed by appending the upstream name and namespace (e.g. in-memory upstreams would result in a cluster name
kube-svc:svcNs-svcName-svcPort_upstreamNs
). Since hyphens can appear in names/namespaces, it was difficult to parse out the workload (service) name/namespace just given the cluster name (also it wouldn't work at all for manually-created upstreams since those names can be anything).To fix this, we now extract info from the Upstream (regardless of how it was created) to put in the cluster name and use
_
as a separator. Currently we only do this for "kube" type upstreams, but we can later extend to other types (static, aws, etc). Also since cluster name changes could potentially result in some downtime during an upgrade, we are only making this change if k8s gateway integration is enabled.Code changes
The main change was that the upstream-to-cluster conversion code has changed to take in an Upstream object, rather than just the upstream ref (so we can extract info from the spec and use it to contruct the cluster name). As a result, all the places that used this code had some refactoring done. There were 3 categories of places that called
UpstreamToClusterName
:recordedUpstreams
in the code). There was no need to use the new conversion code here as all we need is a unique key, so I changed these places to just use the upstream refKey()
upstreamName_upstreamNs
)Interesting decisions
We went through a few iterations of how to do this. We first tried storing service name/namespace as annotations on the Upstream, however that only worked for in-memory upstreams since we didn't want to modify persisted upstreams with newly added annotations. We tried doing the new cluster name format only for k8s gateway proxies, but in those cases we kept running into issues with not having the necessary info the plumb through everywhere it was needed (e.g. proxy type wasn't always available when we try to convert upstream to cluster names). Landed on current design as it was easier just to turn on/off the behavior globally based on whether the k8s gateway enabled flag was on, and passing the whole upstream because it had all the info we needed on the spec.
Testing
Unit tests were added for converting between upstreams and cluster names.
End-to-end testing was manually done via the following steps:
Automated e2e testing, which sends traffic to a route and ensures the right cluster names show up in the envoy metrics, will be added in a follow-up PR.
Checklist: