Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding readme for s3 source [KCON-9] #376

Merged
merged 3 commits into from
Dec 24, 2024
Merged

Conversation

muralibasani
Copy link
Contributor

@muralibasani muralibasani commented Dec 18, 2024

KCON-9 Read me to configure aws s3 source connector

@muralibasani muralibasani changed the title Adding readme for s3 source Adding readme for s3 source [KCON-9] Dec 18, 2024
@muralibasani muralibasani marked this pull request as ready for review December 19, 2024 08:00
@muralibasani muralibasani requested review from a team as code owners December 19, 2024 08:00
@@ -1,6 +1,6 @@
# Aiven's S3 Source Connector for Apache Kafka

This is a source Apache Kafka Connect connector that stores Apache Kafka messages in an AWS S3 bucket.
This is a source Apache Kafka Connect connector that stores AWS S3 bucket objects in Apache Kafka.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is a source Apache Kafka Connect connector that stores AWS S3 bucket objects in Apache Kafka.
This is a source Apache Kafka Connect connector that adds AWS S3 bucket objects onto an Apache Kafka topic.


Example object name : customertopic-00001-1734445664111.txt

## Data Format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header seems incorrect for what is below.
Is this just # Configuration ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it is just configuration, but under it there are relevant sections.
And similar to readme in sink.

### S3 Object Names

S3 connector stores series of files in the specified bucket.
Each object is named using pattern `[<aws.s3.prefix>]<topic>-<partition>-<start_offset>-<extension>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we have it set yet, but there should be a configuration for setting object names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rephrased it. But we are still following the pattern [<aws.s3.prefix>]--<start_offset>- right ?

is deprecated and will be replaced with new one during a certain transition period (within 2-3 releases). Most of the
> configuration parameters remain same.

List of deprecated configuration parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove the deprecated config params and not support them in the S3 Source Connector so we dont have any legacy we need to clean up in the future.

# Possible values 'none' or 'all'. Default being 'none'
# If 'none', then any errors during data transformations or reading objects would cause the connector to fail and shutdown.
# If 'all', then any errors during data transformations or reading objects would have no impact and the connector and
# worker task will continue with next records.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just add errors are logged and ignored for 'all'

# A comma-separated list of topic partitions where the connector's offset storage reader
# can read the stored offsets for those partitions. If not mentioned, s3 objects will be read again if
# available in the bucket
topic.partitions=1,2,3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seem incorrect for the source connector. We should just read all the objects (ideally just once) is this a sink only config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for source connector, and based on these partitions, the offset storage reader reads the map

Copy link
Contributor

@aindriu-aiven aindriu-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, I think this wont be perfect when we merge it there will be a few more updates as we get things done done for release.

@muralibasani
Copy link
Contributor Author

A few comments, I think this wont be perfect when we merge it there will be a few more updates as we get things done done for release.

Yea indeed. Pushed changes. We can always keep updating.

Copy link
Contributor

@aindriu-aiven aindriu-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you for the updates.

@Claudenw Claudenw merged commit e0184bb into s3-source-release Dec 24, 2024
8 checks passed
@Claudenw Claudenw deleted the readme-s3-source branch December 24, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants