Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTAGENT-254 Add support for enhanced RBAC permissions for otel-agent #1693

Merged
merged 2 commits into from
Mar 6, 2025

Conversation

krlv
Copy link
Contributor

@krlv krlv commented Feb 7, 2025

What this PR does / why we need it:

Add support for additional RBAC permissions required by k8sattributes processor.

RBAC Permission Issues in otel-agent

The OpenTelemetry Collector’s k8sattributes processor requires specific RBAC permissions to connect to the Kubernetes API server in order to enrich telemetry data with Kubernetes metadata. However, the default Service Account created by the Helm chart doesn't have these permissions: node agent (deployed as DaemonSet) retrieves pod information from the Kubelet /pod endpoints. This approach avoids overloading the Kubernetes API server in large clusters, but leads to errors in otel-agent when k8sattributes enabled: (Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User <serviceAccount> cannot list resource "pods" in API group "" at the cluster scope)

Proposed Solution

Create the necessary RBAC ClusterRole and ClusterRoleBinding for the k8sattributes processor. The creation logic will be tied to a new boolean parameter – datadog.otelCollector.rbac.create, which, when set to true, enables the chart to inspect the OTel configuration. If any k8sattributes processor is set to passthrough: false (i.e., it needs to call the Kubernetes API), the Helm chart will generate the required RBAC resources.

Implementation Details

  1. New boolean parameter: datadog.otelCollector.rbac.create (true | false).
  2. Conditional RBAC generation:
    1. If agents.rbac.create: true and datadog.otelCollector.rbac.create: true, the Helm chart checks the OTel Collector configuration for any k8sattributes processors configured with passthrough option disabled (either by omitting it or by explicitly setting it passthrough: false).
    2. If found, the necessary K8s ClusterRole and ClusterRoleBinding are generated and associated with the default agent’s Service Account.
  3. New list parameter: datadog.otelCollector.rbac.rules: []
    1. If agents.rbac.create:true and datadog.otelCollector.rbac.create:true, and datadog.otelCollector.rbac.rules provided, these rules are added to the otel-agent ClusterRole.

Special notes for your reviewer:

Why Cluster Role for additional RBAC permissions?

With just a role binding, the k8sattributes processor cannot query metadata such as labels and annotations from k8s nodes and namespaces which are cluster-scoped objects. This also means that the processor cannot set the value for k8s.cluster.uid attribute if enabled, since the k8s.cluster.uid attribute is set to the uid of the namespace kube-system which is not queryable with namespaced rbac.

Why introduce a separate datadog.otelCollector.rbac parameter vs reusing agents.rbac?
  1. Granular Control: separate parameter allows enabling or disabling additional permissions independently.
  2. Clarity Around OTel Functionality: Tying these specific permissions to the section of the Helm chart that configures Embedded OTel Collector helps customers understand what feature is driving the need for the extra RBAC.

Note: technically, any ClusterRole and ClusterRoleBinding introduced under datadog.otelCollector.rbac will apply to the entire pod housing the Datadog Agent with Embedded OTel Collector. While this may not perfectly map to the container-based naming convention, it provides a convenient and explicit toggle for customers. Existing precedent – datadog.secretBackend.roles parameter: scoped to Secret Backend feature; changes affect the default Service Account, and thus additional permissions available to all containers in the node agent pod.

Checklist

  • Chart Version bumped
  • Documentation has been updated with helm-docs (run: .github/helm-docs.sh)
  • CHANGELOG.md has been updated
  • Variables are documented in the README.md

Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, I think you have a typo, but that's about it.

@@ -0,0 +1,40 @@
{{- if and .Values.agents.rbac.create (eq (include "should-enable-otel-agent" .) "true") .Values.datadog.otelCollector.rbac.create -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: should-enable-otel-agent defined here

@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch from c9ff915 to a9632ef Compare February 10, 2025 08:12
@github-actions github-actions bot added the chart/datadog This issue or pull request is related to the datadog chart label Feb 10, 2025
@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch 2 times, most recently from 63669c7 to 8338179 Compare February 10, 2025 08:43
@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch from 8338179 to c27e362 Compare February 19, 2025 00:02
@krlv krlv requested a review from clamoriniere February 19, 2025 00:04
@krlv krlv marked this pull request as ready for review February 19, 2025 00:05
@krlv krlv requested a review from a team as a code owner February 19, 2025 00:05
@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch 3 times, most recently from 539855a to 54c9632 Compare February 28, 2025 06:52
@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch 2 times, most recently from ef4cdae to cc84e27 Compare March 5, 2025 17:30
@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch from de36f4c to 72d38b7 Compare March 6, 2025 18:25
@krlv
Copy link
Contributor Author

krlv commented Mar 6, 2025

/merge

@dd-devflow
Copy link

dd-devflow bot commented Mar 6, 2025

View all feedbacks in Devflow UI.
2025-03-06 20:19:06 UTC ℹ️ Start processing command /merge


2025-03-06 20:19:09 UTC ℹ️ MergeQueue: pull request added to the queue

The median merge time in main is 38m.


2025-03-06 20:56:53 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue bot merged commit 4f0be96 into main Mar 6, 2025
29 checks passed
@dd-mergequeue dd-mergequeue bot deleted the krlv/OTAGENT-254_k8s_permissions branch March 6, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chart/datadog This issue or pull request is related to the datadog chart mergequeue-status: done
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants