Skip to content

Commit

Permalink
updated as per review
Browse files Browse the repository at this point in the history
  • Loading branch information
¨Claude committed Jan 28, 2025
1 parent 6ac5352 commit 9d16763
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ The OffsetManager will provide information about source data that the system has
* `Optional<E> getEntry(final OffsetManagerKey key, final Function<Map<String, Object>, E> creator)` - This method retrieves any data Kafka has about the object identified by the key. If the data are found then the `creator` function is called to create the OffsetManagerEntry implementation. Otherwise, an empty Optional is returned.
* `void remove(final OffsetManagerKey key)` - removes the data from the local OffsetManager cache. This is called when data has been set to Kafka for the record identified by the key. Subsequent calls to `getEntry` will reload the data from Kafka.
* `void remove(final SourceRecord sourceRecord)` - removes the data from the local OffsetManager cache. This is called when the SourceRecord is sent to Kafka. Subsequent calls to `getEntry` will reload the data from Kafka.
* `void load(Collection<OffsetManagerKey> keys)` - loads the data associated with a collection of keys into the OffsetManager cache so that they can be processed without further calls to retrieve the offset data from Kafka.
* `void populateOffsetManager(Collection<OffsetManagerKey> keys)` - loads the data associated with a collection of keys into the OffsetManager cache so that they can be processed without further calls to retrieve the offset data from Kafka.

### OffsetManagerEntry

The `OffsetManagerEntry` is the base class for source specific `OffsetManager` data. It contains and manages the key and data for the OffsetManager. Every data source has a way to identify the data source within the system. This information should be mapped into the OffsetManagerEntry key. Every processing stream that converts data from the source representation to the final SourceRecord(s) has state that it needs to track. The OffsetManagerEntry handles that information and makes it easy to store and retrieve it. The key and the data are key/value maps where the key is a string and the value is an object. The OffsetManagerEntry provides access to the map data via getter and setter methods and provides conversions to common data types. Implementations of OffsetManagerEntry should call the getters/setters to get/store data that will be written to the kafka storage. The class may track any other data this is desired by the developers but only the data in the key and data maps will be preserved.

When thinking about the OffsetManagerEntry they key should be a simple map of key/value pairs that describe the location of the object in the storage system. For something like S3 this would be the bucket and object key, other systems have other coordinate systems. The `getManagerKey` method should return the map. It is important to note that all integral numbers are stored in the Kafka maps as Longs. So all integral key values should be stored a Longs in the map. When integer data values are retrieved the `getInteger(String)` method may be used to ensure that the Long is converted into an Integer.
When thinking about the OffsetManagerEntry they key should be a simple map of key/value pairs that describe the location of the object in the storage system. For something like S3 this would be the bucket and object key, for a database it would be table and primaryKey, other systems have other coordinate systems. The `getManagerKey` method should return the map. It is important to note that all integral numbers are stored in the Kafka maps as Longs. So all integral key values should be stored a Longs in the map. When integer data values are retrieved the `getInteger(String)` method may be used to ensure that the Long is converted into an Integer.

The data that is preserved in the data map should be used to store the state of the extraction of records from the source system. This can include the number of records extracted from systems that store multiple records in an object, or a flag that says the object was processed, or any other information that will help the system determine if the object needs to continue processing. The method `getProperties` should return this map.

Expand Down

0 comments on commit 9d16763

Please sign in to comment.