Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding enum and range datatype support for postgres #5458

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

divbok
Copy link
Contributor

@divbok divbok commented Feb 24, 2025

Description

  • Adding enum and range datatype support for Postgres.
  • As enum does not have a fixed type_id, the enum columns are first fetched from the database and stored in Postgres Stream State and then the data type mapping is done. The enum type is not stored as user-defined enum type, instead we overwrite it to enum while storing.
  • Range datatype has been added without including numeric range data type and multirange. Opensource does not support the high precision and scale of numeric postgres data type hence the mapping cannot be done here.
  • No corresponding multirange data type is supported in Opensource.

Issues Resolved

Contributes to #5309

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Divyansh Bokadia <dbokadia@amazon.com>

Adding enum and range datatype support for postgres

Adding enum and range datatype support for postgres
@@ -49,11 +50,19 @@ public enum ColumnType {
XML(142, "xml"),
UUID(2950, "uuid"),
PG_LSN(3220, "pg_lsn"),
PG_SNAPSHOT(5038, "pg_snapshot"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond this PR, why do we have this enum mapping? Is this not provided by the Posgres library itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such library which provides the type_id of these data types.

@@ -22,6 +25,9 @@ public class PostgresStreamState {
@JsonProperty("replicationSlotName")
private String replicationSlotName;

@JsonProperty("enumColumnsMap")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a clearer name here? What exactly is this storing?

Copy link
Contributor Author

@divbok divbok Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It stores the table name against the set of all enum columns. That is why i went with this naming convention.

private static final NumericTypeHandler numericTypeHandler = new NumericTypeHandler();
private static final TemporalTypeHandler temporalTypeHandler = new TemporalTypeHandler();

private static final Map<PostgresDataType.DataCategory, PostgresDataTypeHandler> typeHandlers = Map.ofEntries(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you moved away from Map.of to ofEntries? The original had a little less boilerplate code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map.of allows only 10 arguments, so I moved to Map.ofEntries


private Object parseRangeValue(PostgresDataType columnType,String columnName, String rangeString) {

if(rangeString.equals("empty"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make "empty" a constant.

private static final String POSTGRES_TIMESTAMP_MAX_INFINITY = "9999-12-31 23:59:59";
private static final String POSTGRES_TIMESTAMPTZ_MIN_INFINITY = "1970-01-01 00:00:00+00";
private static final String POSTGRES_TIMESTAMPTZ_MAX_INFINITY = "9999-12-31 23:59:59+00";
public static final String POSTGRES_DATE_MIN_INFINITY = "1970-01-01";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Map<String, Object> rangeMap = new HashMap<>();
if(!lowerBound.isEmpty()) {
Object parsedLowerBound = numericTypeHandler.handle(columnType, columnName, lowerBound);
rangeMap.put(lowerBoundInclusivity.equals("[") ? "gte" : "gt", parsedLowerBound);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make "gte", "gt", "lte", "lt" constants as well.

@@ -91,6 +91,7 @@ public Map<String, String> getColumnDataTypes(final String fullTableName) {
);
}
}
return columnsToDataType;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing these!

@@ -229,6 +234,88 @@ void getColumnDataTypes_whenColumnsExist_shouldReturnValidMapping() throws SQLEx
assertThat(result, equalTo(expectedColumnTypes));
}

@Test
void getEnumColumns_successfully_returns_enum_columns() throws SQLException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for getColumnDataTypes when enums are involved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants