Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmcluster: improve components readiness check #1907

Merged
merged 8 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion charts/victoria-metrics-cluster/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
## Next release

- TODO
**Update note**: A default `minReadySeconds` has been added for the vmstorage statefulset, vmstorage pods will restart after the upgrade.
**Update note**: The default probes of vminsert, vmselect, vmauth, and vmstorage have been changed, all pods will restart after the upgrade.

- Remove vmstorage liveness probe, as vminsert already handles routing and retries, while liveness probes can inadvertently introduce delays, DNS instability, and unnecessary disruptions.
- Reduce the default readiness probe interval to 5s (was 15s) and the failure threshold to 10 (was 3).
- Add a default minReadySeconds for vmstorage, to help stabilizing service during rollout.

## 0.16.2

Expand Down
16 changes: 8 additions & 8 deletions charts/victoria-metrics-cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1591,10 +1591,10 @@ labels: {}
tcpSocket: {}
timeoutSeconds: 5
readiness:
failureThreshold: 3
failureThreshold: 10
httpGet: {}
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
startup: {}
</code>
Expand Down Expand Up @@ -2531,10 +2531,10 @@ labels: {}
tcpSocket: {}
timeoutSeconds: 5
readiness:
failureThreshold: 3
failureThreshold: 10
httpGet: {}
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
startup: {}
</code>
Expand Down Expand Up @@ -3435,10 +3435,10 @@ labels: {}
tcpSocket: {}
timeoutSeconds: 5
readiness:
failureThreshold: 3
failureThreshold: 10
httpGet: {}
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
startup: {}
</code>
Expand Down Expand Up @@ -3952,11 +3952,11 @@ loggerFormat: json
port: manager-http
timeoutSeconds: 5
readiness:
failureThreshold: 3
failureThreshold: 10
httpGet:
port: manager-http
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
startup: {}
</code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ spec:
{{- end }}
{{- include "vm.license.volume" . | nindent 8 }}
{{- end }}
minReadySeconds: {{ $app.minReadySeconds }}
{{- if and $app.persistentVolume.enabled (not $app.persistentVolume.existingClaim) }}
volumeClaimTemplates:
- apiVersion: v1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,13 @@ deployment should match snapshot:
- containerPort: 8427
name: http
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
volumeMounts:
- mountPath: /config
Expand Down Expand Up @@ -118,13 +118,13 @@ deployment should match snapshot with fullnameOverride, extraLabels and podLabel
- containerPort: 8427
name: http
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
volumeMounts:
- mountPath: /config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,13 @@ deployment should match snapshot:
- containerPort: 8480
name: http
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
serviceAccountName: RELEASE-NAME-victoria-metrics-cluster
deployment should match snapshot with fullnameOverride, extraLabels and podLabels:
Expand Down Expand Up @@ -113,12 +113,12 @@ deployment should match snapshot with fullnameOverride, extraLabels and podLabel
- containerPort: 8480
name: http
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
serviceAccountName: RELEASE-NAME-victoria-metrics-cluster
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,13 @@ deployment should match snapshot:
- containerPort: 8481
name: http
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
securityContext: {}
volumeMounts:
Expand Down Expand Up @@ -123,13 +123,13 @@ deployment should match snapshot with fullnameOverride, extraLabels and podLabel
- containerPort: 8481
name: http
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
securityContext: {}
volumeMounts:
Expand Down Expand Up @@ -197,13 +197,13 @@ statefulset should match snapshot:
- containerPort: 8481
name: http
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
securityContext: {}
volumeMounts:
Expand Down Expand Up @@ -273,13 +273,13 @@ statefulset should match snapshot with fullnameOverride, extraLabels and podLabe
- containerPort: 8481
name: http
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
securityContext: {}
volumeMounts:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ statefulset should match snapshot:
name: RELEASE-NAME-victoria-metrics-cluster-vmstorage
namespace: NAMESPACE
spec:
minReadySeconds: 5
podManagementPolicy: OrderedReady
replicas: 2
selector:
Expand Down Expand Up @@ -40,13 +41,6 @@ statefulset should match snapshot:
- --storageDataPath=/storage
image: victoriametrics/vmstorage:0.1.0-cluster
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
initialDelaySeconds: 30
periodSeconds: 30
tcpSocket:
port: http
timeoutSeconds: 5
name: vmstorage
ports:
- containerPort: 8482
Expand All @@ -56,13 +50,13 @@ statefulset should match snapshot:
- containerPort: 8401
name: vmselect
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
volumeMounts:
- mountPath: /storage
Expand Down Expand Up @@ -96,6 +90,7 @@ statefulset should match snapshot with fullnameOverride, extraLabels and podLabe
name: vmstorage-node
namespace: NAMESPACE
spec:
minReadySeconds: 5
podManagementPolicy: OrderedReady
replicas: 2
selector:
Expand Down Expand Up @@ -124,13 +119,6 @@ statefulset should match snapshot with fullnameOverride, extraLabels and podLabe
- --storageDataPath=/storage
image: victoriametrics/vmstorage:0.1.0-cluster
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
initialDelaySeconds: 30
periodSeconds: 30
tcpSocket:
port: http
timeoutSeconds: 5
name: vmstorage
ports:
- containerPort: 8482
Expand All @@ -140,13 +128,13 @@ statefulset should match snapshot with fullnameOverride, extraLabels and podLabe
- containerPort: 8401
name: vmselect
readinessProbe:
failureThreshold: 3
failureThreshold: 10
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
volumeMounts:
- mountPath: /storage
Expand Down
28 changes: 11 additions & 17 deletions charts/victoria-metrics-cluster/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,9 @@ vmselect:
readiness:
httpGet: {}
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
failureThreshold: 10
# -- VMSelect liveness probe
liveness:
tcpSocket: {}
Expand Down Expand Up @@ -389,9 +389,9 @@ vminsert:
readiness:
httpGet: {}
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
failureThreshold: 10
# -- VMInsert liveness probe
liveness:
tcpSocket: {}
Expand Down Expand Up @@ -639,9 +639,9 @@ vmauth:
readiness:
httpGet: {}
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
failureThreshold: 10
# -- VMAuth liveness probe
liveness:
tcpSocket: {}
Expand Down Expand Up @@ -1021,20 +1021,14 @@ vmstorage:
ipFamilies: []
# -- Pod's termination grace period in seconds
terminationGracePeriodSeconds: 60
# -- Readiness & Liveness probes
minReadySeconds: 5
# -- Readiness probes
probe:
# -- VMStorage readiness probe
readiness:
httpGet: {}
initialDelaySeconds: 5
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
# -- VMStorage liveness probe
liveness:
tcpSocket: {}
initialDelaySeconds: 30
periodSeconds: 30
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
# -- VMStorage startup probe
Expand Down Expand Up @@ -1108,9 +1102,9 @@ vmstorage:
httpGet:
port: manager-http
initialDelaySeconds: 5
periodSeconds: 15
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
failureThreshold: 10
# -- VMBackupManager liveness probe
liveness:
tcpSocket:
Expand Down
Loading