-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align results with those of labs/benchmarks and the Conbench API #116
Conversation
Failing build is a DuckDB thing that I believe will be addressed by #113 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍 this is amazing! What a great effort to standardize. No substantive changes though I think adding your name to the maintainer list is a good idea.
I also wonder with a version change if a NEWS file is merited here. I think an item for this version can still be brief and high-level. Stealing from your comment, something like:
This version of arrowbench aligns the JSON results arrowbench outputs with those currently posted to the Conbench API via voltrondata-labs/benchmarks so that eventually arrowbench can post its own results directly to a conbench instance
@@ -14,7 +14,7 @@ | |||
#' | |||
#' @importFrom waldo compare | |||
#' @export | |||
tpc_h <- Benchmark("tpc_h", | |||
tpc_h <- Benchmark("tpch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[MINOR]: Since you are doing this, I wonder if we should also change tpc_h everywhere to tpch. This would include instances in this file and here:
Lines 6 to 8 in 5abf163
tpc_h_queries <- list() | |
tpc_h_queries[[1]] <- function(input_func, collect_func = dplyr::collect, con = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open if we want to, though it may (?) conflict in some places with the work to implement datalogistik in #113. These name changes are a pain, but necessary because we've been sending the Python names to Conbench in tags.name
. Some of the Python ones have -
instead of _
(e.g. file-read
), though, so we can't make object names exactly match name
elements (and eventual tags) unless we go the other way and modify everything in the database back to the R names, which is possible, but harder. Or we could quote object names in R with `` everywhere and make an object called file-read
, but that just seems like one of those R things that's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These name changes are a pain
100%. Thank you for doing them!
I'll leave it up to you. It is not a material difference and really just reflects a slight nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another example where I wish we had called it tpc-h
or tpc_h
in what was stored in conbench in the first place. Let's go with tpch
for now to reduce the number of migration bits we need, but another good example of how the spirit of this project of getting arrowbench -> conbench more directly will help make sure we put up what we want and not what another shim layer between had
output <- match.arg(output) | ||
output_type = c("arrow_table", "data_frame")) { | ||
# file_type defaults to parquet or feather, but can accept fst as well | ||
file_type <- match.arg(file_type, c("parquet", "feather", "fst")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need {fst} in here anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still run it for comparison if we want, though AFAIK we don't have a policy on when we actually do run external packages and compare against them. It might be nice to run these when there's an Arrow release and include them for context in that report? Not priority, but maybe useful context.
Or was {fst} deprecated or something? I thought I remembered something like that, but now I don't see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe you are thinking disk.frame. And I think the subtext here is that it really doesn't matter too much to include it so why remove. Convinced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yep, that was it! Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we might need to drop support for fst when we move to datalogistik (or do something slightly different if we were to use it), but for now let's keep it here
|
||
d$packages <- NULL | ||
|
||
to_list_col <- lengths(d) != 1 | ||
d[to_list_col] <- lapply(d[to_list_col], list) | ||
|
||
d <- dplyr::as_tibble(d) | ||
d$did_error <- FALSE | ||
d$did_error <- !is.null(self$error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling is an area where we can go further in the future if there's bandwidth and there are enough flaky benchmarks. With the new validation
key on /api/benchmarks/
, we should really be reporting failure to produce the right thing separately from failure to run properly. And in theory we could populate the error_info
and error_type
on PUT /api/runs/{run_id}/
too when setup or teardown fails (and not running or validation).
But in practice most of these don't fail often enough to make the effort very worthwhile. Which is good!
DESCRIPTION
Outdated
@@ -1,7 +1,7 @@ | |||
Package: arrowbench | |||
Type: Package | |||
Title: Tools for Continuous and Interactive Benchmarking | |||
Version: 0.1.0 | |||
Version: 0.1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels to me that this should increment a little more here. 0.2.0
DESCRIPTION
Outdated
@@ -1,7 +1,7 @@ | |||
Package: arrowbench | |||
Type: Package | |||
Title: Tools for Continuous and Interactive Benchmarking | |||
Version: 0.1.0 | |||
Version: 0.1.1 | |||
Authors@R: c( | |||
person("Neal", "Richardson", email = "neal.p.richardson@gmail.com", role = "aut"), | |||
person("Jonathan", "Keane", email = "jkeane@gmail.com", role = c("aut", "cre")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this is a substantial change and you should probably add your name to this list.
arrow-benchmarks-ci buildkite test build (together with voltrondata-labs/benchmarks#122) passes: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-m6g-linux-compute/builds/1740#0184ca79-7c13-407e-ad17-814eed712a28/6-14249 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, it's looking good. I don't have too many substantive comments
.github/workflows/R-CMD-check.yaml
Outdated
on: | ||
push: | ||
branches: | ||
- main | ||
pull_request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for only running on push to main? I definitely from time to time appreciate having CI on branches even before they get to the PR stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were running duplicate builds of everything on PRs because they're both branches and pull requests, so I copied this from one of our other repos. We could drop lines 4-5 (I think?) if we want builds on all pushes to branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were running duplicate builds of everything on PRs because they're both branches and pull requests
Could you point me to an example of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On your PR, if you expand the builds, there are two versions of each, one appended with (pull_request)
, the other with (push)
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, those are testing two different things though (check out the details of the checkouts). The push
is the HEAD
of the branch, the pull_request
is the merge of HEAD
of branch + main. For that specific PR they end up being identical code wise (since I rebased the branch on main a few moments ago), but they can be and frequently are quite different
push
Checking out the ref
/usr/local/bin/git checkout --progress --force -B duckdb-ext refs/remotes/origin/duckdb-ext
Switched to a new branch 'duckdb-ext'
branch 'duckdb-ext' set up to track 'origin/duckdb-ext'.
/usr/local/bin/git log -1 --format='%H'
'14f9a4c5797ef2529328c050e512e18010737531'
https://github.com/voltrondata-labs/arrowbench/actions/runs/3605143011/jobs/6075232144#step:2:455
pull_request
Checking out the ref
/usr/local/bin/git checkout --progress --force refs/remotes/pull/119/merge
HEAD is now at 1dfb8c9 Merge 14f9a4c5797ef2529328c050e512e18010737531 into ada600dea8d6383b802be893b9e26b6b6b38792a
/usr/local/bin/git log -1 --format='%H'
'1dfb8c955aeb476fdc7e6dd055db57cacaa3f2b7'
https://github.com/voltrondata-labs/arrowbench/actions/runs/3605143166/jobs/6075232423#step:2:459
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, interesting. Though it seems like pull_request
is what I'd expect aside from on the default branch. But I can revert.
Authors@R: c( | ||
person("Neal", "Richardson", email = "neal.p.richardson@gmail.com", role = "aut"), | ||
person("Jonathan", "Keane", email = "jkeane@gmail.com", role = c("aut", "cre")) | ||
person("Jonathan", "Keane", email = "jkeane@gmail.com", role = c("aut", "cre")), | ||
person("Edward", "Visel", email = "edward.visel@gmail.com", role = "aut", comment = c(ORCID = "0000-0002-2811-6254")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
#' * `format` One of `c("parquet", "feather", "fst")` | ||
#' * `file_type` One of `c("parquet", "feather", "fst")` | ||
#' * `compression` One of the values: `r paste(known_compressions, collapse = ", ")` | ||
#' * `output` One of `c("arrow_table", "data_frame")` | ||
#' * `output_type` One of `c("arrow_table", "data_frame")` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine changing these here if it's easiest to go this direction, though I kinda wish we went the other way (but understand migrating out of that would be a pain in the ass...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I definitely don't love these changes, but they're the easiest way to keep consistency with history. There's nothing stopping us from changing back and migrating the database in the future if we want, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and if we do decided to do it in the future, doing it separate from these specific changes will make it easier to both do and to diagnose if something goes wrong
output <- match.arg(output) | ||
output_type = c("arrow_table", "data_frame")) { | ||
# file_type defaults to parquet or feather, but can accept fst as well | ||
file_type <- match.arg(file_type, c("parquet", "feather", "fst")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we might need to drop support for fst when we move to datalogistik (or do something slightly different if we were to use it), but for now let's keep it here
@@ -14,7 +14,7 @@ | |||
#' | |||
#' @importFrom waldo compare | |||
#' @export | |||
tpc_h <- Benchmark("tpc_h", | |||
tpc_h <- Benchmark("tpch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another example where I wish we had called it tpc-h
or tpc_h
in what was stored in conbench in the first place. Let's go with tpch
for now to reduce the number of migration bits we need, but another good example of how the spirit of this project of getting arrowbench -> conbench more directly will help make sure we put up what we want and not what another shim layer between had
@@ -135,6 +135,18 @@ tpc_h <- Benchmark("tpc_h", | |||
( params$engine == "dplyr" & params$format == "native" ) | |||
params[!drop,] | |||
}, | |||
batch_id_fun = function(params) { | |||
batch_id <- uuid() | |||
paste0(batch_id, "-", params$scale_factor, substr(params$format, 1, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is effectively copy|paste from voltrondata-labs/benchmarks but ouch that's kinda gross. Would you mind adding a todo here to make this be a bit nicer (probably need to properly stringify the scale factor and also have the format be full(er)?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. We could also change it now, if you like; the only thing I can think of that might (? not sure) care about consistency here is @boshek's release report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put in a TODO
for now
R/run.R
Outdated
run_id = NULL, | ||
batch_id = batch_id, | ||
run_reason = NULL, | ||
# timestamp = utc_now_iso_format(), # let default populate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I just left it so you can see all the fields on the object here even if they're not getting filled in here. It's a little confusing to put it at the end of the line as it's really why the line is commented out; I could put it as a line before, or just drop it all. Or we could actually specify timestamp here. It's all minor.
Oh, and also that release check is actually fixed by https://github.com/voltrondata-labs/arrowbench/pull/117/files#diff-e35a10b8001b4a29a3a67979b010eebbf6285e9b0369c78693662e52d7b9e3b3R12 it's ultimately not going to be needed once we move to datalogistik, but it'll be fixed in main when #117 is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great. Thank you!
This PR changes a number of things, but all with the goal of aligning the JSON results arrowbench outputs with those currently posted to the Conbench API via voltrondata-labs/benchmarks so that eventually arrowbench can post its own results directly.
Changes fall into two categories:
/api/benchmarks/
endpointThis PR cannot be merged until voltrondata-labs/benchmarks#122 is merged, or it will break current arrow benchmarks. That PR contains code to handle changes to assorted names and parameter names here. It has been tested with that branch.