Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39398: [C++][Parquet] DNM: benchmark for readLevels #39486

Closed
wants to merge 1 commit into from

Conversation

mapleFU
Copy link
Member

@mapleFU mapleFU commented Jan 6, 2024

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@mapleFU
Copy link
Member Author

mapleFU commented Jan 6, 2024

Please not merge this patch, this is just for benchmark

for (auto _ : state) {
state.PauseTiming();
Int32Reader* reader = helper.ResetColumnReader();
[[maybe_unused]] bool v = reader->HasNext();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hasNext to trigger initialization

const auto repetition = static_cast<Repetition::type>(state.range(0));
const auto batch_size = static_cast<int64_t>(state.range(1));

BenchmarkHelper helper(repetition, /*num_pages=*/1, /*levels_per_page=*/16 * 80000);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using one page to make it simple

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 6, 2024
@@ -300,6 +300,9 @@ class TypedColumnReader : public ColumnReader {
int16_t* rep_levels, int32_t* indices,
int64_t* indices_read, const T** dict,
int32_t* dict_len) = 0;

virtual void ReadLevels(int64_t batch_size, int16_t* def_levels, int16_t* rep_levels,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making this private with a Test Peer that can be used in the benchmark, so we can check this PR in?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 6, 2024
@emkornfield
Copy link
Contributor

Thank you @mapleFU it would be great to check this in (without the count changes) so we have a good benchmark for ReadLevels. See comment on how to maybe do it without breaking abstractions.

@emkornfield
Copy link
Contributor

I guess looking at CI it would take a little more work than this proof of concept to check it in.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 6, 2024

It's just a quick poc for ReadLevels optimization... I think exporting it is so hacking, because ReadLevels is just "read levels in current page". So, HasNext is called, and I only maintaining one page...The interface would be weird here

@pitrou
Copy link
Member

pitrou commented Jan 6, 2024

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Jan 6, 2024

Benchmark runs are scheduled for commit 0529cce. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

Copy link

Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit 0529cce.

There were 3 benchmark results indicating a performance regression:

The full Conbench report has more details.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2024

Emm would regression benchmark un-related..?

@pitrou
Copy link
Member

pitrou commented Jan 7, 2024

Emm would regression benchmark un-related..?

None of them are related to Parquet.

@mapleFU mapleFU closed this Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] [Parquet] Use std::count in parquet ColumnReader
4 participants