Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-20339: [C++] Add residual filter support to swiss join #39487

Merged
merged 34 commits into from
Mar 12, 2024

Conversation

zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Jan 6, 2024

Rationale for this change

Add residual filter support to swiss join.

What changes are included in this PR?

  1. Added class JoinResidualFilter as a centralized structure to evaluate residual filter in swiss join. It has various flavors of filtering for various join types. Zero-overhead is guaranteed for trivial filters (literal true and sometimes literal false/null). More detailed explanation in code comments.
  2. Tuned the structure of swiss join main body (JoinProbeProcessor::OnNextBatch) to better cope with JoinResidualFilter calls.

Are these changes tested?

Legacy UTs (HashJoin.Random, HashJoin.ResidualFilter and HashJoin.TrivialResidualFilter) cover part of this change. New fine-grained residual filter cases added as well.

Are there any user-facing changes?

No.

Copy link

github-actions bot commented Jan 6, 2024

⚠️ GitHub issue #20339 has been automatically assigned in GitHub to PR creator.

@zanmato1984 zanmato1984 changed the title GH-20339: [C++] Add residual predicate support to swiss join GH-20339: [C++] Add residual filter support to swiss join Jan 6, 2024
@@ -1123,29 +1143,6 @@ uint32_t SwissTableForJoin::payload_id_to_key_id(uint32_t payload_id) const {
return static_cast<uint32_t>(first_greater - entries) - 1;
}

void SwissTableForJoin::payload_ids_to_key_ids(int num_rows, const uint32_t* payload_ids,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted because used nowhere in existing code base.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 8, 2024
@vibhatha
Copy link
Collaborator

vibhatha commented Jan 9, 2024

@zanmato1984 would it be possible to add some test cases?

@zanmato1984
Copy link
Contributor Author

@zanmato1984 would it be possible to add some test cases?

Yes, I am working on that.

@vibhatha
Copy link
Collaborator

vibhatha commented Jan 9, 2024

@zanmato1984 I will completely review once the test cases are there.

@zanmato1984
Copy link
Contributor Author

I tried to rebase and get things messy. Let me fix my branch first. Sorry for the trouble, guys :(

@zanmato1984
Copy link
Contributor Author

Hi @westonpace , I've addressed the comments you previously left except one thing I'm uncertain about in my last post. Would you please take a look again? Thanks!

@westonpace westonpace merged commit 0ce7267 into apache:main Mar 12, 2024
35 of 36 checks passed
@westonpace westonpace removed the awaiting merge Awaiting merge label Mar 12, 2024
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 0ce7267.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 11 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][Compute] Add residual predicate support to new (Swiss) hash join
3 participants