-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with Control Characters in DynamoDB #5437
Deal with Control Characters in DynamoDB #5437
Conversation
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Not sure if I need to make the same change in Export Record Converter because I'm not sure what the export option does for dynamo db source |
@@ -67,6 +68,7 @@ public StreamRecordConverter(final BufferAccumulator<org.opensearch.dataprepper. | |||
this.bytesProcessedSummary = pluginMetrics.summary(BYTES_PROCESSED); | |||
this.streamConfig = streamConfig; | |||
|
|||
MAPPER.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Galactus22625 for working this.
I tend to think this is not the right approach. Here are a few reasons.
- The DDB records do in fact contain properly escaped control characters. So the ultimate bug appears to be that we are unescaping them (probably through the
EnhancedDocument
). - This may result in allowing leniency that we don't actually want to allow. Again, this is not a problem with the input data.
- There may be other issues with the serialization we miss from this.
- Looking at the code, there is a bit of overhead in this EnhancedDocument approach anyway. We take the DDB record, make a JSON string, then deserialize it.
See this comment in particular:
It appears that the best options are:
- Contact the DDB team to see if this is the expected behavior of
EnhancedDocument
. Perhaps by opening a GitHub issue. - or; Change the implementation to parse the DDB record using the approach I took in the proof-of-concept that I shared in the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will try implementing a solution in the style of the proof-of-concept from the comment above
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
private Map<String, Object> convertData(Map<String, AttributeValue> data) throws JsonProcessingException { | ||
String jsonData = EnhancedDocument.fromAttributeValueMap(data).toJson(); | ||
return MAPPER.readValue(jsonData, MAP_TYPE_REFERENCE); | ||
private Map<String, Object> convertData(Map<String, AttributeValue> data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add unit tests to cover all these data types to verify that we handle them correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit tests added
break; | ||
case BS: // BS for Binary Set | ||
result.put(attributeName, attributeValue.bs().stream() | ||
.map(buffer -> BASE64_ENCODER.encodeToString(buffer.asByteArray())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the current behavior - base64 encoding the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i made sure the output matches what was being outputted before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also dynamo db only takes base64 binary values
|
||
data.forEach(((attributeValue) -> { | ||
//dealing with all dynamo db attribute types | ||
switch (attributeValue.type()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could consolidate this code with the code above.
private Object processAttributeValue(AttributeValue attributeValue) {
switch (attributeValue.type()) {
case N:
return new BigDecimal(attributeValue.n());
case B:
return BASE64_ENCODER.encodeToString(attributeValue.b().asByteArray());
....
}
}
And then use it. The following should work (or be close to working)
convertListData(...) {
return data.stream.map(this::processAttributeValue).collect(toList());
}
convertMapData(...) {
return data.stream.collect(toMap(attributeValue::type, this::processAttributeValue));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will consolidate
result.add(null); | ||
break; | ||
default: | ||
throw new IllegalStateException("Unsupported attribute type: " + attributeValue.type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be InvalidArgumentException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will change. I was matching the error message that was being sent before.
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. Thank you for this improvement!
* Deal with Control Characters in DynamoDB Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Description
Manually Parse DDB Records
Issues Resolved
Resolves #5027 . Bug ingesting control characters from dynamo db
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.