Skip to content

cookielab/terraform-kubernetes-grafana-alloy

Repository files navigation

Grafana Alloy Terraform Module

This module deploys Grafana Alloy to collect metrics/traces/logs from various sources in a Kubernetes cluster.

Overview

The module is designed for flexible deployment of Grafana Alloy with different configurations:

  • Cluster Module - Collects metrics from Kubernetes cluster (pods, services, kubelet, cAdvisor)
  • Node Module - Collects node-level metrics using node_exporter
  • Kafka Module - Collects JMX metrics from Kafka brokers
  • AWS Module - Collects metrics from AWS services via CloudWatch
  • Single Module - Collects traces and metrics using OpenTelemetry protocol, Prometheus Alert rules which needs to be single point of processing
  • OpenTelemetry Collector Module - Collects telemetry data (traces and metrics) using the OpenTelemetry protocol and forwards them to Grafana Tempo and Mimir backends
  • Loki Logs Module - Collects logs from Kubernetes pods and forwards them to Loki with support for annotation-based filtering and multi-tenancy

Architecture

The module supports:

  • Scaling to multiple replicas for high availability
  • Clustering for load distribution
  • Flexible configuration using River format
  • Collection of metrics to Prometheus-compatible endpoints
  • Collection of logs to Loki
  • Collection of traces and metrics via OpenTelemetry protocol
  • Support for OpenTelemetry Collector deployment and configuration
  • Configurable resource limits for agents

Modules

The module contains the following submodules:

  • cluster - For collecting Kubernetes metrics
  • node - For collecting system metrics from nodes
  • kafka - For collecting Kafka JMX metrics
  • aws - For collecting AWS CloudWatch metrics
  • single - For collecting OpenTelemetry traces and metrics, Prometheus Alert rules which needs to be single point of processing
  • otel-collector - For collecting OpenTelemetry traces and metrics using the OpenTelemetry Collector protocol
  • loki-logs - For collecting and forwarding Kubernetes pod logs to Loki

Each module can be used independently or in combination based on requirements.

Grafana Alloy Terraform module

Usage

Cluster Module for k8s metrics

module "grafana_alloy_k8s" {
  source  = "./modules/cluster"

  kubernetes_cluster_name = "somecluster"
  kubernetes_namespace    = "cluster-apps"

  agent_name         = "clustered"
  clustering_enabled = true
  replicas           = 3

  config = [<<-EOF
    k8s_pods "my" {
      metrics_output = prometheus.remote_write.default.receiver
    }

    k8s_services "my" {
      metrics_output = prometheus.remote_write.default.receiver
    }

    k8s_cadvisor "my" {
      metrics_output = prometheus.remote_write.default.receiver
    }

    k8s_kubelet "my" {
      metrics_output = prometheus.remote_write.default.receiver
    }
    EOF
  ]

  metrics = {
    endpoint = "https://mimir.example.com:443/api/v1/push"
  }
}

OTel example

module "grafana_alloy_otel" {
  source  = "./modules/otel-collector"

  kubernetes_cluster_name = "somecluster"
  kubernetes_namespace    = "cluster-apps"

  agent_name = "otel"

  config = [<<-EOF
    otel_process "my" {
      metrics_output = prometheus.remote_write.default.receiver
      traces_output  = otelcol.exporter.otelhttp.default.receiver
    }
    EOF
  ]

  metrics = {
    endpoint = "https://mimir.example.com:443/api/v1/push"
  }

  otel = {
    enabled  = true
    endpoint = "https://tempo.example.com:443"
  }
}

NOTE: OTel components are not cluster-capable and some require single point of processing (ie. traces)

Loki Logs Module for k8s pod logs

module "grafana_alloy_loki_logs" {
  source = "./modules/loki-logs"
  loki = {
    url = "http://loki-gateway.monitoring.svc.cluster.local:80/loki/api/v1/push"
    tenant_id = "default"
  }
  kubernetes_namespace = "monitoring"
  kubernetes_cluster_name = "utils"
}

For working examples, look into the submodules

Kubernetes usage resources

  agent_resources = {
    requests = {
      cpu    = "100m"
      memory = "100Mi"
    }
    limits = {
      cpu    = "1"
      memory = "1Gi"
    }
  }

Please note, when limits are undefined, requests values are used for limits too.

Requirements

Name Version
terraform >= 1.3.0, < 2.0.0
helm >= 2.0.0
kubernetes >= 2.0.0

Providers

Name Version
helm >= 2.0.0
kubernetes >= 2.0.0

Modules

No modules.

Resources

Name Type
helm_release.grafana_alloy resource
kubernetes_config_map_v1.grafana_alloy resource
kubernetes_secret_v1.grafana_alloy resource

Inputs

Name Description Type Default Required
agent_name Name of the Grafana Alloy. string n/a yes
agent_resources Resources for the Grafana Alloy
object({
requests = optional(object({
cpu = optional(string, "100m")
memory = optional(string, "256Mi")
}), {})
limits = optional(object({
cpu = optional(string, null)
memory = optional(string, null)
}), {})
})
{} no
chart_version Helm chart version of Grafana Alloy string "1.0.2" no
clustering_enabled Enable Grafana Alloy clustering. NOTE: This is only supported for certain kinds of resources - RTFM bool false no
config Grafana Alloy River configuration. Some configuration should be provided. You're encouraged to use the provided templates. You can also provide your completely own config with default_config_enabled = false. list(string) [] no
controller_resources Resources for the Grafana Alloy controller
object({
requests = optional(object({
cpu = optional(string, "1m")
memory = optional(string, "5Mi")
}), {})
limits = optional(object({
cpu = optional(string, "100m")
memory = optional(string, "50Mi")
}), {})
})
{} no
default_config_enabled Enable default Grafana Alloy config templates. NOTE: Set this to false only if you want to use your own config without the enclosed templates. bool true no
envs Additional environment variables for the Grafana Alloy. You can use this attribute to provide additional secrets without exposing them in the config map output. map(string) {} no
global_tolerations Global tolerations for the Grafana Alloy
list(object({
key = string
operator = string
value = optional(string)
effect = string
tolerationSeconds = optional(number)
}))
[] no
host_volumes Extra volumes to mount to the Grafana Alloy. This is needed for some integrations like node_exporter.
list(object({
name = string
host_path = string
mount_path = string
}))
[] no
iam_role_arn This role is for assuming by cloudwatch exporter string "" no
image Image registry for Grafana Alloy. This is meant to be used with custom pull-through proxies/registries.
object({
registry = optional(string, "docker.io")
repository = optional(string, "grafana/alloy")
})
{} no
integrations Grafana Alloy integrations configuration
object({
otel_collector = optional(bool, false)
loki_logs = optional(bool, false)
k8s_cadvisor = optional(bool, false)
k8s_kubelet = optional(bool, false)
k8s_mimir_rules = optional(bool, false)
k8s_pods = optional(bool, false)
k8s_services = optional(bool, false)
node_exporter = optional(bool, false)
aws_alb = optional(bool, false)
aws_rds = optional(bool, false)
aws_sqs = optional(bool, false)
aws_mq = optional(bool, false)
aws_opensearch = optional(bool, false)
remote_write_metrics = optional(bool, true)
kafka_jmx_metrics = optional(bool, false)
})
{} no
k8s_pods Grafana Alloy scrape settings for K8S pods
object({
scrape_pods_global = optional(bool, false)
scrape_pods_annotation = optional(string, "prometheus_io_scrape")
})
{} no
kafka_jmx_metrics Grafana Alloy scrape JMX kafka metrics
object({
scrape_interval = optional(string, "1m")
scrape_timeout = optional(string, "30s")
scrape_period = optional(string, "1m")
kafka_broker_list = optional(list(string), [])
distinguisher = optional(string, "default")
metrics_endpoint_path = optional(string, "/metrics")
})
{} no
kubernetes_cluster_name Kubernetes cluster name. NOTE: This gets injected into labels/attributes of all collected data. string n/a yes
kubernetes_kind Grafana Alloy Kubernetes resource kind. Valid values are "deployment" or "daemonset". If you want to use clustering, you should use "deployment" with multiple replicas. string "deployment" no
kubernetes_namespace Kubernetes namespace to deploy the Grafana Alloy into. NOTE: The namespace must exist and be available for deployment! string n/a yes
kubernetes_security_context Kubernetes security context configuration for the Grafana Alloy. This is needed with node_exporter to run privileged and as root (UID 0).
object({
runAsUser = optional(number)
privileged = optional(bool)
})
{} no
live_debug Enable live debug for the Grafana Alloy bool false no
loki Grafana Alloy scrape settings for Loki logs
object({
url = optional(string, "http://loki:3100")
tenant_id = optional(string, "default")
username = optional(string, "admin")
password = optional(string, "admin")
auth_enabled = optional(bool, false)
scrape_pods_global = optional(bool, true)
scrape_pods_annotation = optional(string, "loki.logs.enabled")
})
{} no
metrics Grafana Alloy metrics endpoint of Prometheus-compatible receiver. NOTE: You must provide the base URL of the API.
object({
endpoint = optional(string, "http://mimir:9090")
tenant = optional(string, "default")
backend_type = optional(string, "mimir")
ssl_enabled = optional(bool, true)
})
{} no
otel Grafana Alloy OTel configuration. NOTE: There can be only one OTel receiver at the moment.
object({
http_port = optional(number, 4318)
grpc_port = optional(number, 4317)
endpoint = optional(string, "http://tempo:4318")
service_graphs_dimensions = optional(list(string), [])
})
{} no
replicas Number of Grafana Alloy replicas. NOTE: Only valid for kubernetes_kind = "deployment". number 1 no
stability_level n/a string "generally-available" no

Outputs

Name Description
otel_endpoints Exposed OTel endpoints