-
Hey, Finally tried coroot today and must say i love it! For whatever reason all instances are down, no metrics apparently. Any ideas? Config is more or less default apiVersion: coroot.com/v1
kind: Coroot
metadata:
name: coroot
namespace: coroot
spec:
metricsRefreshInterval: 15s # Specifies the metric resolution interval.
cacheTTL: 720h # Duration for which Coroot retains the metric cache.
authBootstrapAdminPassword: admin-password # Initial admin password for bootstrapping. Env:
Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments
-
Hi @miran248, thank you for the report! This appears to be a bug. Could you please share the OS and kernel version used on the worker nodes? |
Beta Was this translation helpful? Give feedback.
-
Hey @def, thanks for the lightning reply! Here's the node info from one of the nodes nodeInfo:
architecture: amd64
bootID: ...
containerRuntimeVersion: containerd://1.7.24
kernelVersion: 6.1.112+
kubeProxyVersion: v1.30.8-gke.1051000
kubeletVersion: v1.30.8-gke.1051000
machineID: ...
operatingSystem: linux
osImage: Container-Optimized OS from Google
systemUUID: ... And a few annotations (prob not important) beta.kubernetes.io/instance-type=c2-standard-4
cloud.google.com/gke-boot-disk=pd-ssd
cloud.google.com/gke-container-runtime=containerd
cloud.google.com/gke-cpu-scaling-level=4
cloud.google.com/gke-logging-variant=DEFAULT
cloud.google.com/gke-max-pods-per-node=110
cloud.google.com/gke-memory-gb-scaling-level=16
cloud.google.com/gke-netd-ready=true
cloud.google.com/gke-os-distribution=cos
cloud.google.com/gke-provisioning=standard
cloud.google.com/gke-stack-type=IPV4
cloud.google.com/machine-family=c2
cloud.google.com/private-node=false
iam.gke.io/gke-metadata-server-enabled=true
kubernetes.io/arch=amd64
kubernetes.io/os=linux |
Beta Was this translation helpful? Give feedback.
-
Thanks! We have a few hypotheses and will validate them on the same infrastructure. We'll keep you posted. |
Beta Was this translation helpful? Give feedback.
-
Thanks! Btw, here's the argocd app for redis, just in case. apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: production-hasura-redis
namespace: argocd
spec:
destination:
namespace: production
server: "https://kubernetes.default.svc"
project: default
source:
chart: redis
repoURL: https://charts.bitnami.com/bitnami
targetRevision: 20.3.0
helm:
releaseName: production-hasura-redis
values: |
architecture: replication
auth:
enabled: false
master:
count: 1
disableCommands: []
resources:
requests:
cpu: 100m
memory: 256Mi
nodeSelector:
env: shared
persistence:
enabled: true
size: 1Gi
sentinel:
enabled: true
masterSet: production-hasura-redis
masterService:
enabled: true
persistence:
enabled: true
size: 1Gi
replica:
automountServiceAccountToken: true
rbac:
create: true
metrics:
enabled: true
syncPolicy:
automated:
prune: true
# selfHeal: true |
Beta Was this translation helpful? Give feedback.
-
I couldn't reproduce this issue on my GKE cluster. ![]() Can you please provide the output of |
Beta Was this translation helpful? Give feedback.
-
Hey, here's the output from Name: production-hasura-redis-node-2
Namespace: production
Priority: 0
Service Account: production-hasura-redis
Node: gke-***-420faeb3-spii/10.154.15.233
Start Time: Mon, 20 Jan 2025 17:53:38 +0100
Labels: app.kubernetes.io/component=node
app.kubernetes.io/instance=production-hasura-redis
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=redis
app.kubernetes.io/version=7.4.2
apps.kubernetes.io/pod-index=2
controller-revision-hash=production-hasura-redis-node-c9d4b98d
helm.sh/chart=redis-20.6.3
isMaster=true
statefulset.kubernetes.io/pod-name=production-hasura-redis-node-2
Annotations: checksum/configmap: a70e7bd40189c2fa6ebc89b540d827e69bede945408cb8a65732f7074c5adfbf
checksum/health: 7c1bc273685f377b654b2d5d82e81c96f6c59d6963786575eccc63a56d63a6cd
checksum/scripts: aeff7634605d8df25975c509533bdf36aada5d284afc6c1d50572ece5034313c
checksum/secret: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
prometheus.io/port: 9121
prometheus.io/scrape: true
Status: Running
IP: 10.76.2.61
IPs:
IP: 10.76.2.61
Controlled By: StatefulSet/production-hasura-redis-node
Containers:
redis:
Container ID: containerd://89bd14917a4f81157618ce2694df1f63a4248fd679ef891f39531b1fba431ab2
Image: docker.io/bitnami/redis:7.4.2-debian-12-r0
Image ID: docker.io/bitnami/redis@sha256:65f55fefc0acd7f1a1da44b39be3044bcfbc03f4a49c4689453097f929f07132
Port: 6379/TCP
Host Port: 0/TCP
SeccompProfile: RuntimeDefault
Command:
/bin/bash
Args:
-c
/opt/bitnami/scripts/start-scripts/start-node.sh
State: Running
Started: Mon, 20 Jan 2025 17:53:43 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 256Mi
Liveness: exec [sh -c /health/ping_liveness_local.sh 5] delay=20s timeout=5s period=5s #success=1 #failure=5
Readiness: exec [sh -c /health/ping_readiness_local.sh 1] delay=20s timeout=1s period=5s #success=1 #failure=5
Startup: exec [sh -c /health/ping_liveness_local.sh 5] delay=10s timeout=5s period=10s #success=1 #failure=22
Environment:
BITNAMI_DEBUG: false
REDIS_MASTER_PORT_NUMBER: 6379
ALLOW_EMPTY_PASSWORD: yes
REDIS_TLS_ENABLED: no
REDIS_PORT: 6379
REDIS_SENTINEL_TLS_ENABLED: no
REDIS_SENTINEL_PORT: 26379
REDIS_DATA_DIR: /data
STAKATER_PRODUCTION_HASURA_REDIS_SCRIPTS_CONFIGMAP: 73e9131546af07c4fc943bc9d908c071d7a5aa1b
Mounts:
/data from redis-data (rw)
/health from health (rw)
/opt/bitnami/redis-sentinel/etc from sentinel-data (rw)
/opt/bitnami/redis/etc from empty-dir (rw,path="app-conf-dir")
/opt/bitnami/redis/mounted-etc from config (rw)
/opt/bitnami/scripts/start-scripts from start-scripts (rw)
/tmp from empty-dir (rw,path="tmp-dir")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lbjmm (ro)
sentinel:
Container ID: containerd://d424309678bf768f473b736c0d63125a58c6918ad033fbe9d76d156c56b143ca
Image: docker.io/bitnami/redis-sentinel:7.4.2-debian-12-r0
Image ID: docker.io/bitnami/redis-sentinel@sha256:f2953d5e62b386bb2985043907f5c8af51b8466a9f9e1fc16fd0a500624bad46
Port: 26379/TCP
Host Port: 0/TCP
SeccompProfile: RuntimeDefault
Command:
/bin/bash
Args:
-c
/opt/bitnami/scripts/start-scripts/start-sentinel.sh
State: Running
Started: Mon, 20 Jan 2025 17:53:46 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 256Mi
Liveness: exec [sh -c /health/ping_sentinel.sh 5] delay=20s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [sh -c /health/ping_sentinel.sh 1] delay=20s timeout=1s period=5s #success=1 #failure=6
Startup: exec [sh -c /health/ping_sentinel.sh 5] delay=10s timeout=5s period=10s #success=1 #failure=22
Environment:
BITNAMI_DEBUG: false
ALLOW_EMPTY_PASSWORD: yes
REDIS_SENTINEL_TLS_ENABLED: no
REDIS_SENTINEL_PORT: 26379
Mounts:
/data from redis-data (rw)
/etc/shared from kubectl-shared (rw)
/health from health (rw)
/opt/bitnami/redis-sentinel/etc from sentinel-data (rw)
/opt/bitnami/redis-sentinel/mounted-etc from config (rw)
/opt/bitnami/scripts/start-scripts from start-scripts (rw)
/tmp from empty-dir (rw,path="tmp-dir")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lbjmm (ro)
metrics:
Container ID: containerd://f8fc5a00cd2b97193213325bf2407f23622997247bd00720833362ddd822e405
Image: docker.io/bitnami/redis-exporter:1.67.0-debian-12-r0
Image ID: docker.io/bitnami/redis-exporter@sha256:5bc3229b94f62b593600ee74d0cd16c7a74df31852eb576bdc0f5e663c8e1337
Port: 9121/TCP
Host Port: 0/TCP
SeccompProfile: RuntimeDefault
Command:
/bin/bash
-c
if [[ -f '/secrets/redis-password' ]]; then
export REDIS_PASSWORD=$(cat /secrets/redis-password)
fi
redis_exporter
State: Running
Started: Mon, 20 Jan 2025 17:53:50 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 150m
ephemeral-storage: 2Gi
memory: 192Mi
Requests:
cpu: 100m
ephemeral-storage: 50Mi
memory: 128Mi
Liveness: tcp-socket :metrics delay=10s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:metrics/ delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
REDIS_ALIAS: production-hasura-redis
REDIS_EXPORTER_WEB_LISTEN_ADDRESS: :9121
Mounts:
/tmp from empty-dir (rw,path="tmp-dir")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lbjmm (ro)
kubectl-shared:
Container ID: containerd://10eb028dffcac550ff580c2cc7edee4468b1edb11a3e80615759295e317b276e
Image: docker.io/bitnami/kubectl:1.32.0-debian-12-r0
Image ID: docker.io/bitnami/kubectl@sha256:493d1b871556d48d6b25d471f192c2427571cd6f78523eebcaf4d263353c7487
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
Command:
/opt/bitnami/scripts/kubectl-scripts/update-master-label.sh
State: Running
Started: Mon, 20 Jan 2025 17:53:56 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/shared from kubectl-shared (rw)
/opt/bitnami/scripts/kubectl-scripts from kubectl-scripts (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lbjmm (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
redis-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: redis-data-production-hasura-redis-node-2
ReadOnly: false
sentinel-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: sentinel-data-production-hasura-redis-node-2
ReadOnly: false
start-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: production-hasura-redis-scripts
Optional: false
health:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: production-hasura-redis-health
Optional: false
kubectl-shared:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubectl-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: production-hasura-redis-kubectl-scripts
Optional: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: production-hasura-redis-configuration
Optional: false
empty-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-lbjmm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none> (i upgraded redis to Initially i had issues, where coroot couldn't access metrics on some services, that was because i had metrics endpoints disabled - that was then fixed and those errors went away. Status however hasn't changed, all instances (including redis) are down. |
Beta Was this translation helpful? Give feedback.
-
Could you also share a screenshot of the Memory tab for the redis app? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Btw, now that everything's up, restarts have fixed themselves as well, they're all now at 0. This is last 60 mins. |
Beta Was this translation helpful? Give feedback.
-
Today, we released a new version of the agent that includes several fixes. While I didn’t expect this update to resolve your issue, the agent restarted because our operator automatically checks for new versions every hour and updates the components as needed. Let’s mark this as resolved for now, but feel free to reopen the issue or raise a new one if anything goes wrong. Thanks again for your report! |
Beta Was this translation helpful? Give feedback.
Today, we released a new version of the agent that includes several fixes. While I didn’t expect this update to resolve your issue, the agent restarted because our operator automatically checks for new versions every hour and updates the components as needed.
Let’s mark this as resolved for now, but feel free to reopen the issue or raise a new one if anything goes wrong. Thanks again for your report!