Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission workflow #13

Merged
merged 6 commits into from
Nov 26, 2023
Merged

Submission workflow #13

merged 6 commits into from
Nov 26, 2023

Conversation

emmamendelsohn
Copy link
Collaborator

@emmamendelsohn emmamendelsohn commented Nov 13, 2023

See comments here: #12

minioclient::mc_alias_set("efi", "data.ecoforecast.org", "", "")
minioclient::mc_cp(dir,
paste0("efi/spat4cast-submissions/", theme),
paste0("efi/spat4cast-submissions/", theme, "/", fire), # should we have separate buckets for each fire?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, that's probably easiest. paralleling the existing conventions in efi, I think we should refer to theme as the variable and the fire location as the site_id, and let's express this in hive format, so maybe something like:

glue("efi/spat4cast-submissions/variable={theme}/site_id={fire}")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh somewhere in here we need a model_id={model_id} as well, since we will have different models (from the same or different teams of people) forecasting the same site_id / reference_datetime etc. (I guess we currently have that in the filenames but might help to have it explicitly on the path).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would you suggest validating model_id? would we need to provide guidance on how to identify the team and model type?


# Check file type
assertthat::assert_that(n_distinct(tools::file_ext(submitted_files)) == 1)
assertthat::assert_that(unique(tools::file_ext(submitted_files)) == "tif", msg = "Error: The file extension in the directory is not 'tif'. Please check the file format.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. 🎉

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we probably want a dictionary of valid site_id names (ideally this will come from some spatial vector table we give people with site_id , date of fire/disturbance, and the polygon geometry).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, added some documentation and set the function to enforce valid site_id and variable names:
https://github.com/eco4cast/modis-lai-forecast/blob/submission-workflow/R/spat4cast_submit.R#L5-L20

@cboettig
Copy link
Collaborator

This is awesome! A few comments inline above. @rqthomas want to give this a look over? any opinions on the order of partitions in the hive labelling scheme?

@emmamendelsohn
Copy link
Collaborator Author

emmamendelsohn commented Nov 14, 2023

@cboettig I incorporated your suggestions. The hive-based file structure seems to be working as expected now, as is the move function (see the spat4cast-forecasts/ bucket).

Do we need any logic around checking duplicates? Or allowing users to overwrite submissions?

@rqthomas
Copy link
Collaborator

A reference_date partition is needed to prevent overwritting forecasts.

out <- glue::glue("efi/spat4cast-submissions/variable={variable}/site_id={site_id}/model_id={model_id}/reference_date={reference_date}") 

add reference date as @rqthomas suggests
@cboettig cboettig merged commit ad23c66 into main Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants