Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenSearch alias field type #1032

Merged
merged 5 commits into from
Feb 6, 2025

Conversation

penghuo
Copy link
Collaborator

@penghuo penghuo commented Feb 4, 2025

Description

Summary

This PR introduces support for alias fields in FlintDataType, allowing fields to be referenced by alternative names. The change enables schema parsing and JSON processing to recognize alias fields and map them to their corresponding primary fields.

Changes

  1. FlintDataType.scala

    • Updated deserializeJValue to identify and process alias fields.
    • Modified deserializeField to support alias metadata assignment.
  2. FlintJacksonParser.scala

    • Implemented fieldMapping to associate JSON keys with schema field indices, allowing alias fields to be correctly resolved during parsing.
    • Updated JSON parsing logic to handle multiple schema fields mapped to a single JSON key.
  3. Add OpenSearch Table datatype in docs

Testing

  • Added test cases for alias field deserialization to validate schema correctness.
  • Integration tests confirm alias field support in OpenSearch queries.

Related Issues

#1033

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peng Huo <penghuo@gmail.com>
@penghuo penghuo self-assigned this Feb 4, 2025
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
@penghuo penghuo marked this pull request as ready for review February 5, 2025 00:42
@penghuo penghuo added opensearch table enhancement New feature or request labels Feb 5, 2025
@penghuo
Copy link
Collaborator Author

penghuo commented Feb 5, 2025

@ykmr1224 @dai-chen please help review.

Signed-off-by: Peng Huo <penghuo@gmail.com>
@penghuo penghuo requested a review from ykmr1224 February 5, 2025 20:52
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@qianheng-aws
Copy link
Contributor

qianheng-aws commented Feb 6, 2025

LGTM!

One minor concern from me is that the implementation of this PR is more like the Option3 we mentioned in: opensearch-project/sql#3246 (comment)

So there are some minor behavioral differences between OpenSearch index PPL and Flint PPL. Since we populate the InternalRow with alias columns filled with real data during scanning, the alias columns won't reflect updates made to the original columns in the following execution.

Nevertheless, since Spark doesn't support alias types, implementing like Option1 is challenging in Flint. Besides, it's a very edge case where users would query using alias columns while updating the original columns in this query.

@penghuo
Copy link
Collaborator Author

penghuo commented Feb 6, 2025

LGTM!

One minor concern from me is that the implementation of this PR is more like the Option3 we mentioned in: opensearch-project/sql#3246 (comment)

So there are some minor behavioral differences between OpenSearch index PPL and Flint PPL. Since we populate the InternalRow with alias columns filled with real data during scanning, the alias columns won't reflect updates made to the original columns in the following execution.

Nevertheless, since Spark doesn't support alias types, implementing like Option1 is challenging in Flint. Besides, it's a very edge case where users would query using alias columns while updating the original columns in this query.

Yes, it is option3.
This is not an issue for the OS since it operates in near-real-time. Consequently, the latest document updates may not be immediately reflected during a search.

@penghuo penghuo merged commit eff717a into opensearch-project:main Feb 6, 2025
4 checks passed
@penghuo penghuo deleted the dataTypeAlias branch February 6, 2025 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants