From 62667bfa7c34ec009266f35272e62f0dd4b089a5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aindri=C3=BA=20Lavelle?= <121855584+aindriu-aiven@users.noreply.github.com> Date: Mon, 27 Jan 2025 10:23:43 +0000 Subject: [PATCH] Adding information for handling S3 Sink multipart upload aborted parts (#385) The S3 Sink currently uses the multipart upload api, this can leave "parts" aborted or orphaned if there is an issue with the connector or the api and these parts should be periodically cleaned up. Adding information to the S3 readme on AWS lifecycle rules which can be used to remove old parts. --------- Signed-off-by: Aindriu Lavelle --- s3-sink-connector/README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/s3-sink-connector/README.md b/s3-sink-connector/README.md index fb690d5e8..f0e68ee61 100644 --- a/s3-sink-connector/README.md +++ b/s3-sink-connector/README.md @@ -625,6 +625,14 @@ There are four configuration properties to configure retry strategy exists. - To use SSE-KMS set to `aws:kms` - To use DSSE-KMS set to `aws:kms:dsse` + +### Cleaning temporary files from failed multipart uploads +The S3 Sink Connector uploads files using the S3 multipart upload API for improved performance and handling large files. +Occasionally the API can throw an exception or the connector can fail to complete a multipart upload. +This can leave orphaned "parts" of a failed multipart upload taking up unnecessary space. +To handle these incomplete parts AWS recommends setting up a Lifecycle rule to delete old parts that weren't completed as described in this excellent (blog post)[https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/]. +Alternatively, if you would prefer to work through the official documentation it is available (here)[https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html] + ## Development ### Developing together with Commons library