Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TPC-H PPL query suite #830

Merged
merged 7 commits into from
Nov 7, 2024

Conversation

LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Oct 29, 2024

Description

Add TPC-H PPL query suite for

  • ensure all TPC-H SQL can be rewritten by PPL
  • ensure all 22 PPL queries can be executed in Spark
  • ensure all physical plans generated matches the expected output
  • ensure there is no performance regression

Related Issues

Resolves #806

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@YANG-DB YANG-DB added Lang:PPL Pipe Processing Language support 0.6 testing test related feature labels Oct 29, 2024

All TPC-H PPL Queries located in `integ-test/src/integration/resources/tpch` folder.

To test all queries, run `org.opensearch.flint.spark.ppl.tpch.TPCHQueryITSuite`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin can u plz add here a result printout of a running of the TPCHQueryITSuite

Copy link
Member Author

@LantaoJin LantaoJin Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current suite just check the generated code (Spark whole stage codegen) for all 22 PPL queries can be properly compiled. The output shows the codegen time and execution time. They are not stable which depends on hardware. So there is no input data required for this PR. This PR is just for functionality coverage as a test suite.
Later I will create another issue for benchmarking.

@LantaoJin LantaoJin merged commit 48be5cc into opensearch-project:main Nov 7, 2024
4 checks passed
kenrickyap pushed a commit to Bit-Quill/opensearch-spark that referenced this pull request Dec 11, 2024
* Add TPC-H PPL query suite

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix failure of loading resources

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix data_add()

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* enable q21 and add docs

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 Lang:PPL Pipe Processing Language support testing test related feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add a TPC-H PPL query suite
2 participants