Skip to content

Commit

Permalink
Adding information for handling S3 Sink multipart upload aborted parts (
Browse files Browse the repository at this point in the history
#385)

The S3 Sink currently uses the multipart upload api, this can leave
"parts" aborted or orphaned if there is an issue with the connector or
the api and these parts should be periodically cleaned up.

Adding information to the S3 readme on AWS lifecycle rules which can be
used to remove old parts.

---------

Signed-off-by: Aindriu Lavelle <aindriu.lavelle@aiven.io>
  • Loading branch information
aindriu-aiven authored Jan 27, 2025
1 parent 2edf58b commit 62667bf
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions s3-sink-connector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,14 @@ There are four configuration properties to configure retry strategy exists.
- To use SSE-KMS set to `aws:kms`
- To use DSSE-KMS set to `aws:kms:dsse`


### Cleaning temporary files from failed multipart uploads
The S3 Sink Connector uploads files using the S3 multipart upload API for improved performance and handling large files.
Occasionally the API can throw an exception or the connector can fail to complete a multipart upload.
This can leave orphaned "parts" of a failed multipart upload taking up unnecessary space.
To handle these incomplete parts AWS recommends setting up a Lifecycle rule to delete old parts that weren't completed as described in this excellent (blog post)[https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/].
Alternatively, if you would prefer to work through the official documentation it is available (here)[https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html]

## Development

### Developing together with Commons library
Expand Down

0 comments on commit 62667bf

Please sign in to comment.