Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Acero] Refine hash join benchmark #45591

Closed
zanmato1984 opened this issue Feb 20, 2025 · 1 comment
Closed

[C++][Acero] Refine hash join benchmark #45591

zanmato1984 opened this issue Feb 20, 2025 · 1 comment

Comments

@zanmato1984
Copy link
Contributor

Describe the enhancement requested

The current hash join benchmark can be refined in the following aspects:

  1. It is using openmp for multi-threading, and this is the solely place that depends on openmp library. Based on my experience of recent benchmarking, openmp can cause tricky inaccurate benchmarking numbers (some obvious improvement shows worse number in one or two out of tens of cases, I still don't know why), and make diagnosis hard (I was using flame graph and openmp makes it hard to interpret, see the following two pictures).
    Flame graph with openmp:
    Image
    Flame graph without openmp:
    Image
    I'm not entirely sure why openmp was introduced at the first place (maybe @save-buffer can elaborate a bit?), but I think at this point we can trivially replace it with arrow-native multi-threading primitives. (Besides a bunch of simplification of cmake could be done).

  2. In certain benchmarks, we are more interested in the number of rows from the build side. Currently all benchmarks are based on the number of rows from the probe side.

Component(s)

C++

zanmato1984 added a commit that referenced this issue Feb 21, 2025
…rom the project (#45593)

### Rationale for this change

See #45591 .

### What changes are included in this PR?

1. Replace the usage of openmp with arrow-native multi-threading primitives;
2. Remove all the occurrences of openmp from the project;
3. Support stats for build side rows in hash join benchmark, and update certain benchmark.

### Are these changes tested?

Manually tested.

### Are there any user-facing changes?

Removed a public CMake option but I think it shouldn't affect the user.

* GitHub Issue: #45591

Authored-by: Rossi Sun <zanmato1984@gmail.com>
Signed-off-by: Rossi Sun <zanmato1984@gmail.com>
@zanmato1984 zanmato1984 added this to the 20.0.0 milestone Feb 21, 2025
@zanmato1984
Copy link
Contributor Author

Issue resolved by pull request 45593
#45593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant