Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib: Move file filters into runners #1646

Merged
merged 4 commits into from
Sep 16, 2024
Merged

stdlib: Move file filters into runners #1646

merged 4 commits into from
Sep 16, 2024

Conversation

V-FEXrt
Copy link
Contributor

@V-FEXrt V-FEXrt commented Sep 9, 2024

Moves the fnInput/fnOutput functions into the runners so they can directly control them. Allows things like the cache runners to cache only actual output files

@@ -60,7 +60,7 @@ export def mkJobCacheRunner (hashFn: RunnerInput => Result String Error) (wakero
def job_cache_add str = prim "job_cache_add"

def run (job: Job) (input: RunnerInput): Result RunnerOutput Error =
def (RunnerInput label cmd vis env dir stdin _ prefix _ _) = input
def (RunnerInput label cmd vis env dir stdin _ prefix _ _ _ _) = input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid not to use fnInputs or fnOutputs? Is that not part of the runner's contract with the Plan and Job?
Or are we not using them here because this function calls another run function that does call those input/output functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid not to use fnInputs or fnOutputs?

Nope, it must always be called at least once. In this case, the cache runners don't call it because the runner they are wrapping is expected to call it

Copy link
Collaborator

@ag-eitilt ag-eitilt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable thing for it to own. Slight documentation expansion would be nice, but the code seems good.

Copy link
Contributor

@colinschmidt colinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@colinschmidt colinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@ag-eitilt ag-eitilt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the cleanable list interact with the FnOutputs, for existing code? I definitely prefer how much more explicit this setup is, but I want to be sure that existing code which obviously doesn't interact with CleanableOutputs doesn't break, seeing as the function only seems to be applied as def fileOutputs = fnOutputs cleanable in the runners:

# Allow an untracked file to be removed via `wake --clean`
export target markFileCleanable (filepath: String): Result Unit Error =
def hashPlan cmd =
Plan "" cmd Nil Nil "." "" logNever logError logDebug ReRun Nil hashUsage identity identity False
def job =
hashPlan ("<hash>", filepath, Nil)
| setPlanLabel "hash: {filepath}"
| setPlanFnOutputs (\_ filepath, Nil)
| runJobWith localRunner
| setJobInspectVisibilityHidden
if job.isJobOk then
Pass Unit
else
failWithError "Failed to hash {filepath}"

And, actually, do we want to introduce a new Plan.FnCleanable to provide end users with more control over that list as well?

@colinschmidt
Copy link
Contributor

How does the cleanable list interact with the FnOutputs, for existing code? I definitely prefer how much more explicit this setup is, but I want to be sure that existing code which obviously doesn't interact with CleanableOutputs doesn't break, seeing as the function only seems to be applied as def fileOutputs = fnOutputs cleanable in the runners:

Existing code calls CleanableOutputs All_Outputs or something and is always populated with the unfiltered outputs in runAlways. User code that implements its own runner will now have to fill in this field in the RunnerOutput to update to this version of wake which should not be a high burden given they are maintaining their own Runner already.

And, actually, do we want to introduce a new Plan.FnCleanable to provide end users with more control over that list as well?

I think this is a good idea but needs to be carefully described as something even less likely the user wants to play with than Plan.FnOutputs.

@V-FEXrt
Copy link
Contributor Author

V-FEXrt commented Sep 16, 2024

How does the cleanable list interact with the FnOutputs, for existing code? I definitely prefer how much more explicit this setup is, but I want to be sure that existing code which obviously doesn't interact with CleanableOutputs doesn't break

Yep its a tiny bit subtle but the tl;dr is that previously Runners were required to return "All outputs" that they knew about then runAlways did two things. 1. Pass the unfiltered list of all outputs into the now renamed but identical "AllOutputs" 2. run fnOuputs over the all outputs list and pass that into "outputs". Now the runner directly controls both lists but trivially does the exact same thing.

One subtle and interesting thing is that "AllOutputs" was a very strong misnomer since that list is always empty for the localRunner while the "filtered" outputs of the local runner has the actual discovered outputs. So in the runtime we actually get something like

cleanable_outputs = "all outputs as reported by runner"
hashed_real_outputs = FnOutputs cleanable_outputs (which may increase the size of the list)
unhashed_real_cleanable = cleanable_outputs - hashed_real_outputs

So the only things actually processed separately are items that are in cleanable_outputs but not in "filtered" outputs

And, actually, do we want to introduce a new Plan.FnCleanable to provide end users with more control over that list as well?

Lets punt on that for a bit. I can see it being useful but this change is already large, not one is currently asking for it, and its rather hard to explain what should go into it (esp since the cache won't store files in that list).

@V-FEXrt V-FEXrt merged commit bdc0112 into master Sep 16, 2024
11 checks passed
@V-FEXrt V-FEXrt deleted the stdlib-job-filter branch September 16, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants