Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use benchconnect to publish results to a Conbench server #127

Merged
merged 8 commits into from
Jan 23, 2023

Conversation

alistaire47
Copy link
Contributor

This PR:

  1. integrates benchconnect with the run.BenchmarkDataFrame() S3 method when publish = TRUE to allow arrowbench to publish directly to a Conbench server instead of requiring wrapping via voltrondata-labs/benchmarks.
  2. adapts the management of the external benchconnect dependency from [WIP] Add datalogistik as datasource #113, leaving the bare bones of datalogistik installation in place as well, though obviously not the full integration that PR entails.
  3. adds {processx} as a dependency, as it allows finer control than system(). In the future we may consider moving some of the preexisting system() calls to processx::run() as well if they get troublesome.
  4. aligns with Add environment variables for posting metadata conbench/conbench#587 in its use of environment variables for passing things like github metadata.
  5. populates errorred benchmarks sufficiently to publish them to Conbench, closing Populate result properly when erroring so it can be posted #121. It does not populate validation or the run-level error fields yet, but it will at least let us see what's failing provided the whole run does not fail.
  6. Unit tests call benchconnect augment, but mock calls that require an active Conbench server (benchconnect start run, benchconnect submit result, and benchconnect finish run). Add integration tests arrow-benchmarks-ci#94 would integration test the full toolchain; for now I've run it against a local server instance successfully.

This publishing will not be active in real benchmark running until arrow-benchmarks-ci is rearranged to call it.

@alistaire47 alistaire47 self-assigned this Jan 13, 2023
@alistaire47
Copy link
Contributor Author

@austin3dickey I can't add you as a reviewer, but please take a look anyway if you've got time

Copy link
Contributor

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start, thank you!

A few comments, though the main ones are a few more comments here and there + a question about benchconnect and passing info

@@ -20,16 +20,68 @@ run.default <- function(x, ...) {
}


#' @param publish Flag for whether to publish results to a Conbench server. See
#' "Environment Variables" section for how to specify server details. Requires
#' the benchconnect CLI is installed; see [install_benchconnect()].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should autolink correctly, yeah? But regardless we want code formatting there

Suggested change
#' the benchconnect CLI is installed; see [install_benchconnect()].
#' the benchconnect CLI is installed; see [`install_benchconnect`].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does already; this is one of the automagics of the markdown update:
image

@@ -20,16 +20,68 @@ run.default <- function(x, ...) {
}


#' @param publish Flag for whether to publish results to a Conbench server. See
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to have a name + description for this function (or is that elsewhere already?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's on the generic. Currently only this method is really useful and all the docs are particular to it, but eventually it might be nice to write a method that would dispatch to run_benchmark() for a single benchmark.

R/run.R Outdated Show resolved Hide resolved
R/run.R Outdated
Comment on lines 70 to 78
withr::with_envvar(
c(CONBENCH_RUN_NAME = run_name, CONBENCH_RUN_REASON = run_reason),
{ ress <- run_benchmark(bm = bm, params = params, ...) }
)
if (publish) {
ress$results <- lapply(ress$results, augment_result)
lapply(ress$results, submit_result)
}
ress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to pass run_name and run_reason through in the results JSON? It's not a huge deal, but it seems a little bit funny that we would pass information through both env vars and JSON here rather then relying on an all-json approach (or in principle an all env-var approach, but that would be silly!).

I know that there are good reasons to have things like CONBENCH_RUN_NAME exist independently, but I thought that benchconnect allowed both specifying through json or it would pick up the defaults form the envvars if nothing was passed through the json. Am I confused about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: what's the relation with the env vars being used in run below? If I'm reading this correctly it looks like those use the unprocessed versions of those env vars (e.g. before lines 58-63 process some of the contents of them)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to pass run_name and run_reason through in the results JSON? It's not a huge deal, but it seems a little bit funny that we would pass information through both env vars and JSON here rather then relying on an all-json approach (or in principle an all env-var approach, but that would be silly!).

I know that there are good reasons to have things like CONBENCH_RUN_NAME exist independently, but I thought that benchconnect allowed both specifying through json or it would pick up the defaults form the envvars if nothing was passed through the json. Am I confused about that?

Yeah, this was one of the weird design decisions I made. So in theory, it is possible to pass these all the way through as arguments to each level of the runner. But at the bottom, we actually need to set them as env vars anyway to make sure they get passed across to the new process created by run_script() so they'll get picked up in the BenchmarkResult$new() call in the run_bm() call in the script. And we need them in-between, too, because there's a second BenchmarkResult$new() call in the original process for when the script errors.

There are other potential ways besides env vars to pass them across to run_bm(), e.g. some form of currying, but they mostly seem worse. And if we take that decision as fixed, then changing all the intermediate signatures to just pass them through seems superfluous when we can just let the env vars carry them through. But this is an odd design, so if we'd rather, I can rearrange all the signatures and create these env vars in run_script().

Also: what's the relation with the env vars being used in run below? If I'm reading this correctly it looks like those use the unprocessed versions of those env vars (e.g. before lines 58-63 process some of the contents of them)?

I don't think I understand the question? These mostly don't change, although when we consume them we do change "" (which Sys.getenv() will return even if it's not set) to NULL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to leave it as is if it's the least-bad option (especially as we transition to this newer way of running things). It might be helpful to have some of the context you put here in comment(s) explaining why what looks a bit off is the best option for now.

Also: what's the relation with the env vars being used in run below? If I'm reading this correctly it looks like those use the unprocessed versions of those env vars (e.g. before lines 58-63 process some of the contents of them)?

I don't think I understand the question? These mostly don't change, although when we consume them we do change "" (which Sys.getenv() will return even if it's not set) to NULL.

I was mostly wondering about:

    if (is.null(run_name)) {
      run_name <- paste(run_reason, github$commit, sep = ": ")
    }

Where we transform the run_name if it's null. Maybe this is also something we put in the comments about what's going on so it's clear(er) to our future selves what we did + why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Yeah, that's because I started using f"{run_reason}: {github$commit}" as a default run_name on the Python side for legacy reasons and, well, anything is better than nothing, so I ported it over here so we're not doing the same thing in arrow-benchmarks-ci or wherever just so there's something in there.

We could stick that default in BenchmarkRun and BenchmarkResult if we want, though that does get a bit tricky about handling updating (e.g. if run_reason is updated, run_name probably should get updated if it was a previously-generated default, but not if it was previously specified, so you have to be able to tell the difference). That felt overly complicated, so, well, this.

Another possibility would be to do it in the signature with lazy eval like

run.BenchmarkDataFrame <- function(x, 
                                   ..., 
                                   publish = FALSE, 
                                   github = github_info(), 
                                   run_reason = NULL, 
                                   run_name = paste(run_reason, github$commit, sep = ": ")) {

but the drawback there is github$commit will be NULL for any non-source Arrow installation because arrow::arrow_info()$build_info$git_id isn't populated on precompiled binaries. Adding that users running but not publishing will not bother to set run_reason either, and their default run_name will be ": ", which is pretty silly. We could do

                                   run_name = if (publish) paste(run_reason, github$commit, sep = ": ")) {

instead, but maybe that's too clever?

Anyway, sorry that's a bit long-winded, but that's my thought process. If any of the alternatives seem desirable to you, I'm happy to change it; otherwise I'll put a comment in there on why if I can figure out how to make it shorter than this Treatise on Defaults for run_name.


I'm fine to leave it as is if it's the least-bad option (especially as we transition to this newer way of running things). It might be helpful to have some of the context you put here in comment(s) explaining why what looks a bit off is the best option for now.

Honestly I'm not totally convinced myself that I made the right call here; maybe I should just change all the signatures. I might try and see if I find that more distasteful than this, and if so, put an explanation in a comment here.

Another option here would be to force these params to be passed via env var (which is a bit extreme), or allow it by making defaults run_name = Sys.getenv("CONBENCH_RUN_NAME"), run_reason = Sys.getenv("CONBENCH_RUN_REASON")`. Either way, it would be promoting them to first class env vars exposed to external users instead of just used internally, and would make the arrow-benchmarks-ci a little more concise, basically

bm_df <- get_package_benchmarks()
bm_df <- bm_df[bm_df$name %in% c("vector", "of", "benchmark", "names"), ]
run(bm_df, publish = TRUE)

instead of explicitly splicing in all the env vars, at the cost of making it even harder to see how all the pieces therein get their metadata—all the github stuff is already hidden this way, as are the Conbench client setup values (URL, user, password). I guess I'm conflicted because regardless it feels like I'm adding to the complexity of this codebase and making it harder to maintain. I'm starting to get the appeal of config files and objects corresponding to them so at least everything is in one place. (Not that I'm suggesting that here; it's very un-R-ic.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot take. I know you've put lots of effort/code into ensuring run name has sensible defaults, but because of a lot of recent work we really don't need run_name to be "nice" anymore. In fact, it's almost superfluous. It doesn't change any Conbench behavior anymore, and it's only shown in the webapp on the Benchmark page, where you can also easily see the run reason and commit hash if you want to.

So if it would be easier/cleaner to soft-deprecate run name, leave it blank or make it a UUID, and not make it possible for a user to change that, maybe we should consider that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm cool with nulling it out, but code like this makes me nervous it will cause issues somewhere. Making it a UUID seems duplicative when we've already got run_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch, there is one place it's used -- the bottom of the /compare/runs/A...B page in a picker widget.

I just made conbench/conbench#607 to look more into deprecation of run_name on the conbench side. In the meantime, I still think it already is redundant and everything would be fine with a UUID here.

Copy link
Contributor

@jonkeane jonkeane Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without seeing the actual code-consequences here I'm finding it a bit hard to see the advantages of each of the options listed. It sounds like (though correct me if I'm wrong, there were a number of options proposed and I might not be following the consequences of each exactly perfectly!) we have a few less-than-good options where the complexity is either in passed signatures or using env vars for some things but not other things and have that funky bifurcation.

Another option here would be to force these params to be passed via env var (which is a bit extreme), or allow it by making defaults run_name = Sys.getenv("CONBENCH_RUN_NAME"), run_reason = Sys.getenv("CONBENCH_RUN_REASON")`. Either way, it would be promoting them to first class env vars exposed to external users instead of just used internally, and would make the arrow-benchmarks-ci a little more concise, basically

This (use envvar, making the defaults bit) seems least bad so far (especially for what we need to accomplish right now). Currently the only callers of arrowbench that need to publish are ci systems (and specifically only arrow-benchmarks-ci) so I'm not particularly worried about that interface being annoying to use by humans (yet).

I would be in favor of doing the least-bad thing here (along with commenting heavily where there is even a little bit of funk) and getting this stuff merged in + actually running (since currently all of this is theoretical and never being called anywhere). I'm certain we will learn as we use this for real, and we can revisit + polish this design when we get real feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, managed to rip out the new env vars wholesale by adding to signatures. It's not horrible, and I think is less confusing.

R/run.R Outdated
Comment on lines 81 to 83
if (publish) {
finish_run(run = bm_run)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a TODO here about putting this in an on.exit or other kind of finalizer in case there are issues up above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is a good idea; I'm going to not wait and do this in this PR because I did manage to create a lot of bad state I had to manually clean up while getting this working, and I really don't want us to have to shell into these machines to do the same.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{withr} is one of my favorite dependencies xD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tests/testthat/test-run.R Outdated Show resolved Hide resolved
Copy link
Contributor

@boshek boshek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great thanks! I don't have any huge comments.

I did try out the installation process and after some paths futzing it did work. The installation proceeded super smoothly. Nice. Paths promise to be a challenge but if there are only a few users here then smoothing that out might not be worth the effort.

One possible TODO from this (and is probably something that could happen when you tackle the CI stuff) is a write up how one would do this from benchmark to posting. In fact it might be useful for someone like me who is less close to this to actually write that document and then you can tell me all the ways that I am wrong. If that sounds useful, let me know when you take on the CI part of this and we can collaborate.

)
}

warning(warningCondition(msg, class = "notInstalledWarning"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

#'
#' @export
install_pipx <- function() {
system('pip install pipx && pipx ensurepath', intern = TRUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially naive question, but is there any reason we aren't using processx::run here if only for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, this is mostly because I wrote this before I ran into the issues which drove me to processx::run() in the first place 😄

We do still have a number of system() calls; not just these, but also the very important one here that runs the script. These are easy to change over, but that one might be trickier—I'm pretty sure there are some subtleties in how that's set up, and I really don't want to bork that one. It's very likely worth changing, but I'd want to make sure Jon has time to do a very thorough review of that particular change.

install_benchconnect <- function() {
stopifnot(pipx_available())

url <- "benchconnect@git+https://github.com/conbench/conbench.git@main#subdirectory=benchconnect"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if it might make sense to wrap this in a function so that a) we could add other bits from conbench more easily if ever needed and b) at least bake in some ability to change which branch/ref benchconnect is using (much like below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've already run into some issues in development where I needed a branch version of benchadapt (a dep of benchconnect) installed, which led to me learning more about pipx.

Long-term, though, the solution here is to put benchconnect and datalogistik on PyPI and/or conda-forge so we can use more normal Python dependency management instead; installing from GitHub has a number of unpleasant complications. I think Austin has some thoughts on that and our general monorepo strategy for Conbench.

external_cli_available(cli = "benchconnect")
}

datalogistik_available <- function() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried messing around with how vscode versus RStudio picks up these paths. Though it is likely something that I've done locally I just wanted to highlight that vscode was only able to pick the paths from the Renviron file. I only raise this because if this got wider usage this issue would likely crop up. Likely fine for now though given limited use of this package for the specific task of arrow R benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to get some of our conversation yesterday where it should be: Mostly this shouldn't be too much of an issue for CI, since everything is run from the command line, but for users running from apps like RStudio, they may need to mess with PATH via .Renviron or otherwise.

Again, since benchconnect only really needs to be installed where we publish, this probably isn't too much of an issue because that's a lot of machines handled by CI, but for datalogistik, this matters more since benchmark authors will also need it to work locally.

host_name <- "fake-computer"
github <- list(
commit = "fake-commit",
repository = "https://github.com/conchair/conchair",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🪑

@alistaire47
Copy link
Contributor Author

I did try out the installation process and after some paths futzing it did work. The installation proceeded super smoothly. Nice. Paths promise to be a challenge but if there are only a few users here then smoothing that out might not be worth the effort.

In theory, we only need benchconnect available on machines from which we publish benchmarks, so provided we can get it installed in CI, we're pretty much ok. But datalogistik will have to be installed everywhere we run benchmarks (including locally), so ultimately we do need this system to work well enough that benchmark authors have a good experience. That's probably never going to include an enormous number of people, so I don't think that means CRAN standards, necessarily, but we do want it to be a good experience regardless.

One possible TODO from this (and is probably something that could happen when you tackle the CI stuff) is a write up how one would do this from benchmark to posting. In fact it might be useful for someone like me who is less close to this to actually write that document and then you can tell me all the ways that I am wrong. If that sounds useful, let me know when you take on the CI part of this and we can collaborate.

Like the process of writing a new benchmark and getting it running via CI? That sounds useful, though yeah, it's dependent on my next CI story. We could use it to revamp the README a bit (it's got a lot of useful stuff there, but could use some curation to align with user stories), maybe? Or turn it into a vignette if we don't want to touch that.

@boshek
Copy link
Contributor

boshek commented Jan 19, 2023

it's dependent on my next CI story

When you get there let me know. I think this is probably a useful exercise.

@alistaire47
Copy link
Contributor Author

Ok, this PR should be pretty much good to merge now, if y'all could take a quick look through the changes since your last reviews 🙏

@jonkeane
Copy link
Contributor

Let's merge this and get it into use. We're at the point that we need to shake out issues by actually using this — we might come back and refine more that's 100% ok and what we should do

@alistaire47 alistaire47 merged commit cd0592f into main Jan 23, 2023
@alistaire47 alistaire47 deleted the edward/publish-results branch January 23, 2023 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants