-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed backup shows as completed when failure to read a volume occurs #1032
Comments
I realise that adding a |
Thanks for this new issue. Maybe a bit of a background why it currently happens this way: Restic (the tool we use underneath K8up) will continue to try to backup, even if it runs into a "permission denied" or other error. Restic will then track these errors internally and provide a count of such errors at the end of the run. Restic will then exit with an exit code of 3, which states that the backup might be incomplete, due to not being able to read all files. How we currently handle this in K8up is that we treat exit code 3 as successful, but we expose the Having said that, there's room for improvement: If Restic exits with code 3, K8up can catch that and set a special condition on the backup object. Something like "PartialBackupCompleted". So it will be more visible without the whole Prometheus setup. |
I still think there exists a condition where failing to read the entire directory should be classed as a failed backup. Relying on the |
Restic doesn't have an exit code to distinguish those cases. It might be that it throws a However, we'd have to do some tests to verify that. If it returns also |
Description
If a volume mounted to the backup pod is unreadable, k8up will report an error
during scan
that the volume is unreadable. This will then proceed to the next step to check for files which also fails. The result is an empty snapshot.The problem is that this is determined as a successful backup, which IMO is wrong. If I've asked to back up a volume, and the entire volume is determined as unreadable, then this is a failure.
Additional Context
We discovered this when we were wondering why a volume snapshot was empty when we had received no backup failure alerts uselagoon/build-deploy-tool#361
The permission on the volume meant the backup pod user was unable to access it at all.
I know you've mentioned that
k8up_backup_restic_last_errors
contains some information on files failed etc. But we're talking about the entire volume in this case.When looking at the backup pod that is created, the user is
65532
and the permissions on the volume mean it is not accessible to this user, and this results in the backup scan failing.No files even get backed up in this instance, but the backup is still classed as a "success". Both of the logs that show error, either of them should really result in a backup failure.
Logs
You can see the initial error here where the scan results in an error. And the subsequent archival process results in an error too.
Expected Behavior
If the volume is unreadable, I would expect the backup to fail. Even if other parts of the backup succeed.
Steps To Reproduce
uselagoon/build-deploy-tool#361
Version of K8up
v2.5.2
Version of Kubernetes
v1.31.0
Distribution of Kubernetes
EKS, GCP, AKS
The text was updated successfully, but these errors were encountered: