Skip to content

Commit

Permalink
doc: Improved Tetragon metrics guide
Browse files Browse the repository at this point in the history
Signed-off-by: Philip Schmid <phisch@cisco.com>
  • Loading branch information
PhilipSchmid committed Aug 26, 2024
1 parent 9e35cba commit e372cf8
Showing 1 changed file with 46 additions and 20 deletions.
66 changes: 46 additions & 20 deletions docs/content/en/docs/installation/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,18 @@ For the full list, refer to [metrics reference]({{< ref "/docs/reference/metrics

### Kubernetes

In a [Kubernetes installation]({{< ref "/docs/installation/kubernetes" >}}), metrics are enabled by default and exposed
via `tetragon` service at endpoint `/metrics` on port `2112`.
In a [Kubernetes installation]({{< ref "/docs/installation/kubernetes" >}}), **metrics are enabled by default** and
exposed via `tetragon` service at endpoint `/metrics` on port `2112` (Agent) and `2113` (Operator).

You can change the port via Helm values:

```yaml
tetragon:
prometheus:
port: 2222 # default is 2112
tetragonOperator:
prometheus:
port: 3333 # default is 2113
```
Or entirely disable the metrics server:
Expand All @@ -33,23 +36,26 @@ Or entirely disable the metrics server:
tetragon:
prometheus:
enabled: false # default is true
tetragonOperator:
prometheus:
enabled: false # default is true
```
### Non-Kubernetes
In a non-Kubernetes installation, metrics are disabled by default. You can enable them by setting the metrics server
address, for example `:2112`, via the `--metrics-server` flag.
In a non-Kubernetes installation, **metrics are disabled by default**. You can enable them by setting the metrics server
address of the Tetragon Agent to, for example, `:2112`, via the `--metrics-server` flag.

If using [systemd]({{< ref "/docs/installation/package" >}}), set the `metrics-address` entry in a file under the
`/etc/tetragon/tetragon.conf.d/` directory.

## Verify that metrics are exposed

To verify that the metrics server has started, check the logs of the Tetragon Agent.
In Kubernetes, run:
To verify that the metrics server has started, check the logs of the Tetragon components.
Here's an example for the Tetragon Agent, running on Kubernetes:

```shell
kubectl -n kube-system logs ds/tetragon
kubectl -n <tetragon-namespace> logs ds/tetragon
```

The logs should contain a line similar to the following:
Expand All @@ -61,11 +67,12 @@ To see what metrics are exposed, you can access the metrics endpoint directly.
In Kubernetes, forward the metrics port:

```shell
kubectl -n kube-system port-forward svc/tetragon 2112:2112
kubectl -n <tetragon-namespace> port-forward svc/tetragon 2112:2112
```

Access `localhost:2112/metrics` endpoint either in a browser or for example using `curl`.
You should see a list of metrics similar to the following:

```
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
Expand All @@ -78,6 +85,12 @@ promhttp_metric_handler_errors_total{cause="gathering"} 0

## Configure labels on events metrics

{{< note >}}

This feature is only available starting from Tetragon 1.13+.

{{< /note >}}

Depending on the workloads running in the environment, [Events Metrics]({{< ref "/docs/reference/metrics#tetragon-events-metrics" >}})
may have very high cardinality. This is particularly likely in Kubernetes environments, where each pod creates
a separate timeseries. To avoid overwhelming Prometheus, Tetragon provides an option to choose which labels are
Expand All @@ -92,27 +105,40 @@ tetragon:
metricsLabelFilter: "namespace,workload,binary" # "pod" label is disabled
```

## Scrape metrics
## Enable Prometheus ServiceMonitors

Typically, metrics are scraped by Prometheus or another compatible agent (for example OpenTelemetry Collector), stored
in Prometheus or another compatible database, then queried and visualized for example using Grafana.

In Kubernetes, you can install Prometheus and Grafana using the `kube-prometheus-stack` Helm chart:

```shell
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
```
In Kubernetes, you can install Prometheus and Grafana using the [Kube-Prometheus-Stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart. This Helm chart includes [Prometheus Operator](https://prometheus-operator.dev/),
which allows you to configure Prometheus via Kubernetes custom resources.

The `kube-prometheus-stack` Helm chart includes [Prometheus Operator](https://prometheus-operator.dev/), which allows
you to configure Prometheus via Kubernetes custom resources. Tetragon comes with a default `ServiceMonitor` resource
containing the scrape confguration. You can enable it via Helm values:
Simultaneously, Tetragon comes with default `ServiceMonitor` resources containing the scrape confguration for the Agent
and Operator. You can enable it via Helm values:

```yaml
tetragon:
prometheus:
serviceMonitor:
enabled: true
tetragonOperator:
prometheus:
serviceMonitor:
enabled: true
```

{{< warning >}}

By default, the Prometheus Operator only discovers `PodMonitors` and `ServiceMonitors` within its namespace, that are
labeled with the same release tag as the prometheus-operator release.

Hence, you need to configure it to also scrape data from Tetragon's `ServiceMonitor` resources, which usually don't
reside in the Prometheus namespace. Refer to the official [Kube-Prometheus-Stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) documentation for more details.

{{< /warning >}}

To ensure that Prometheus has detected the Tetragon metrics endpoints, you can check the Prometheus targets:

1. Access the Prometheus UI.
2. Navigate to the "Status" tab and select "Targets".
3. Verify that the Tetragon metric endpoints are listed and their status is `UP`.

0 comments on commit e372cf8

Please sign in to comment.