Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align results with those of labs/benchmarks and the Conbench API #116

Merged
merged 12 commits into from
Dec 6, 2022

Conversation

alistaire47
Copy link
Contributor

This PR changes a number of things, but all with the goal of aligning the JSON results arrowbench outputs with those currently posted to the Conbench API via voltrondata-labs/benchmarks so that eventually arrowbench can post its own results directly.

Changes fall into two categories:

  1. Changes to result structure to align with the /api/benchmarks/ endpoint
  2. Changes to benchmarks so the data that populates results aligns with what labs/benchmarks generates. This has only been done for the set of benchmarks run by arrow-benchmarks-ci.

This PR cannot be merged until voltrondata-labs/benchmarks#122 is merged, or it will break current arrow benchmarks. That PR contains code to handle changes to assorted names and parameter names here. It has been tested with that branch.

@alistaire47 alistaire47 self-assigned this Nov 22, 2022
@alistaire47 alistaire47 marked this pull request as ready for review November 23, 2022 18:00
@alistaire47
Copy link
Contributor Author

Failing build is a DuckDB thing that I believe will be addressed by #113

Copy link
Contributor

@boshek boshek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 this is amazing! What a great effort to standardize. No substantive changes though I think adding your name to the maintainer list is a good idea.

I also wonder with a version change if a NEWS file is merited here. I think an item for this version can still be brief and high-level. Stealing from your comment, something like:

This version of arrowbench aligns the JSON results arrowbench outputs with those currently posted to the Conbench API via voltrondata-labs/benchmarks so that eventually arrowbench can post its own results directly to a conbench instance

@@ -14,7 +14,7 @@
#'
#' @importFrom waldo compare
#' @export
tpc_h <- Benchmark("tpc_h",
tpc_h <- Benchmark("tpch",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MINOR]: Since you are doing this, I wonder if we should also change tpc_h everywhere to tpch. This would include instances in this file and here:

tpc_h_queries <- list()
tpc_h_queries[[1]] <- function(input_func, collect_func = dplyr::collect, con = NULL) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open if we want to, though it may (?) conflict in some places with the work to implement datalogistik in #113. These name changes are a pain, but necessary because we've been sending the Python names to Conbench in tags.name. Some of the Python ones have - instead of _ (e.g. file-read), though, so we can't make object names exactly match name elements (and eventual tags) unless we go the other way and modify everything in the database back to the R names, which is possible, but harder. Or we could quote object names in R with `` everywhere and make an object called file-read, but that just seems like one of those R things that's

jurassic park could/should gif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These name changes are a pain

100%. Thank you for doing them!
I'll leave it up to you. It is not a material difference and really just reflects a slight nit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another example where I wish we had called it tpc-h or tpc_h in what was stored in conbench in the first place. Let's go with tpch for now to reduce the number of migration bits we need, but another good example of how the spirit of this project of getting arrowbench -> conbench more directly will help make sure we put up what we want and not what another shim layer between had

output <- match.arg(output)
output_type = c("arrow_table", "data_frame")) {
# file_type defaults to parquet or feather, but can accept fst as well
file_type <- match.arg(file_type, c("parquet", "feather", "fst"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need {fst} in here anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still run it for comparison if we want, though AFAIK we don't have a policy on when we actually do run external packages and compare against them. It might be nice to run these when there's an Arrow release and include them for context in that report? Not priority, but maybe useful context.

Or was {fst} deprecated or something? I thought I remembered something like that, but now I don't see it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe you are thinking disk.frame. And I think the subtext here is that it really doesn't matter too much to include it so why remove. Convinced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yep, that was it! Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we might need to drop support for fst when we move to datalogistik (or do something slightly different if we were to use it), but for now let's keep it here


d$packages <- NULL

to_list_col <- lengths(d) != 1
d[to_list_col] <- lapply(d[to_list_col], list)

d <- dplyr::as_tibble(d)
d$did_error <- FALSE
d$did_error <- !is.null(self$error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling is an area where we can go further in the future if there's bandwidth and there are enough flaky benchmarks. With the new validation key on /api/benchmarks/, we should really be reporting failure to produce the right thing separately from failure to run properly. And in theory we could populate the error_info and error_type on PUT /api/runs/{run_id}/ too when setup or teardown fails (and not running or validation).

But in practice most of these don't fail often enough to make the effort very worthwhile. Which is good!

DESCRIPTION Outdated
@@ -1,7 +1,7 @@
Package: arrowbench
Type: Package
Title: Tools for Continuous and Interactive Benchmarking
Version: 0.1.0
Version: 0.1.1
Copy link
Contributor

@boshek boshek Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels to me that this should increment a little more here. 0.2.0

DESCRIPTION Outdated
@@ -1,7 +1,7 @@
Package: arrowbench
Type: Package
Title: Tools for Continuous and Interactive Benchmarking
Version: 0.1.0
Version: 0.1.1
Authors@R: c(
person("Neal", "Richardson", email = "neal.p.richardson@gmail.com", role = "aut"),
person("Jonathan", "Keane", email = "jkeane@gmail.com", role = c("aut", "cre"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is a substantial change and you should probably add your name to this list.

@alistaire47
Copy link
Contributor Author

arrow-benchmarks-ci buildkite test build (together with voltrondata-labs/benchmarks#122) passes: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-m6g-linux-compute/builds/1740#0184ca79-7c13-407e-ad17-814eed712a28/6-14249

Copy link
Contributor

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, it's looking good. I don't have too many substantive comments

Comment on lines 1 to 5
on:
push:
branches:
- main
pull_request:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for only running on push to main? I definitely from time to time appreciate having CI on branches even before they get to the PR stage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were running duplicate builds of everything on PRs because they're both branches and pull requests, so I copied this from one of our other repos. We could drop lines 4-5 (I think?) if we want builds on all pushes to branches

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were running duplicate builds of everything on PRs because they're both branches and pull requests

Could you point me to an example of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On your PR, if you expand the builds, there are two versions of each, one appended with (pull_request), the other with (push):

image

Copy link
Contributor

@jonkeane jonkeane Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, those are testing two different things though (check out the details of the checkouts). The push is the HEAD of the branch, the pull_request is the merge of HEAD of branch + main. For that specific PR they end up being identical code wise (since I rebased the branch on main a few moments ago), but they can be and frequently are quite different

push

Checking out the ref
  /usr/local/bin/git checkout --progress --force -B duckdb-ext refs/remotes/origin/duckdb-ext
  Switched to a new branch 'duckdb-ext'
  branch 'duckdb-ext' set up to track 'origin/duckdb-ext'.
/usr/local/bin/git log -1 --format='%H'
'14f9a4c5797ef2529328c050e512e18010737531'

https://github.com/voltrondata-labs/arrowbench/actions/runs/3605143011/jobs/6075232144#step:2:455

pull_request

Checking out the ref
  /usr/local/bin/git checkout --progress --force refs/remotes/pull/119/merge
  HEAD is now at 1dfb8c9 Merge 14f9a4c5797ef2529328c050e512e18010737531 into ada600dea8d6383b802be893b9e26b6b6b38792a
/usr/local/bin/git log -1 --format='%H'
'1dfb8c955aeb476fdc7e6dd055db57cacaa3f2b7'

https://github.com/voltrondata-labs/arrowbench/actions/runs/3605143166/jobs/6075232423#step:2:459

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, interesting. Though it seems like pull_request is what I'd expect aside from on the default branch. But I can revert.

Authors@R: c(
person("Neal", "Richardson", email = "neal.p.richardson@gmail.com", role = "aut"),
person("Jonathan", "Keane", email = "jkeane@gmail.com", role = c("aut", "cre"))
person("Jonathan", "Keane", email = "jkeane@gmail.com", role = c("aut", "cre")),
person("Edward", "Visel", email = "edward.visel@gmail.com", role = "aut", comment = c(ORCID = "0000-0002-2811-6254"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

R/bm-df-to-table.R Outdated Show resolved Hide resolved
Comment on lines -5 to +7
#' * `format` One of `c("parquet", "feather", "fst")`
#' * `file_type` One of `c("parquet", "feather", "fst")`
#' * `compression` One of the values: `r paste(known_compressions, collapse = ", ")`
#' * `output` One of `c("arrow_table", "data_frame")`
#' * `output_type` One of `c("arrow_table", "data_frame")`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine changing these here if it's easiest to go this direction, though I kinda wish we went the other way (but understand migrating out of that would be a pain in the ass...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I definitely don't love these changes, but they're the easiest way to keep consistency with history. There's nothing stopping us from changing back and migrating the database in the future if we want, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and if we do decided to do it in the future, doing it separate from these specific changes will make it easier to both do and to diagnose if something goes wrong

output <- match.arg(output)
output_type = c("arrow_table", "data_frame")) {
# file_type defaults to parquet or feather, but can accept fst as well
file_type <- match.arg(file_type, c("parquet", "feather", "fst"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we might need to drop support for fst when we move to datalogistik (or do something slightly different if we were to use it), but for now let's keep it here

@@ -14,7 +14,7 @@
#'
#' @importFrom waldo compare
#' @export
tpc_h <- Benchmark("tpc_h",
tpc_h <- Benchmark("tpch",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another example where I wish we had called it tpc-h or tpc_h in what was stored in conbench in the first place. Let's go with tpch for now to reduce the number of migration bits we need, but another good example of how the spirit of this project of getting arrowbench -> conbench more directly will help make sure we put up what we want and not what another shim layer between had

@@ -135,6 +135,18 @@ tpc_h <- Benchmark("tpc_h",
( params$engine == "dplyr" & params$format == "native" )
params[!drop,]
},
batch_id_fun = function(params) {
batch_id <- uuid()
paste0(batch_id, "-", params$scale_factor, substr(params$format, 1, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is effectively copy|paste from voltrondata-labs/benchmarks but ouch that's kinda gross. Would you mind adding a todo here to make this be a bit nicer (probably need to properly stringify the scale factor and also have the format be full(er)?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We could also change it now, if you like; the only thing I can think of that might (? not sure) care about consistency here is @boshek's release report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in a TODO for now

R/result.R Show resolved Hide resolved
R/run.R Outdated
run_id = NULL,
batch_id = batch_id,
run_reason = NULL,
# timestamp = utc_now_iso_format(), # let default populate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I just left it so you can see all the fields on the object here even if they're not getting filled in here. It's a little confusing to put it at the end of the line as it's really why the line is commented out; I could put it as a line before, or just drop it all. Or we could actually specify timestamp here. It's all minor.

@jonkeane
Copy link
Contributor

jonkeane commented Dec 1, 2022

Oh, and also that release check is actually fixed by https://github.com/voltrondata-labs/arrowbench/pull/117/files#diff-e35a10b8001b4a29a3a67979b010eebbf6285e9b0369c78693662e52d7b9e3b3R12 it's ultimately not going to be needed once we move to datalogistik, but it'll be fixed in main when #117 is merged

Copy link
Contributor

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. Thank you!

@alistaire47 alistaire47 changed the title [DO NOT MERGE] Align results with those of labs/benchmarks and the Conbench API Align results with those of labs/benchmarks and the Conbench API Dec 5, 2022
@alistaire47 alistaire47 merged commit 85c3e7d into main Dec 6, 2022
@alistaire47 alistaire47 deleted the edward/result-structure branch December 6, 2022 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants