-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission workflow #13
Conversation
R/spat4cast_submit.R
Outdated
minioclient::mc_alias_set("efi", "data.ecoforecast.org", "", "") | ||
minioclient::mc_cp(dir, | ||
paste0("efi/spat4cast-submissions/", theme), | ||
paste0("efi/spat4cast-submissions/", theme, "/", fire), # should we have separate buckets for each fire? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, that's probably easiest. paralleling the existing conventions in efi, I think we should refer to theme as the variable
and the fire location as the site_id
, and let's express this in hive format, so maybe something like:
glue("efi/spat4cast-submissions/variable={theme}/site_id={fire}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh somewhere in here we need a model_id={model_id}
as well, since we will have different models (from the same or different teams of people) forecasting the same site_id / reference_datetime etc. (I guess we currently have that in the filenames but might help to have it explicitly on the path).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you suggest validating model_id
? would we need to provide guidance on how to identify the team and model type?
R/spat4cast_submit.R
Outdated
|
||
# Check file type | ||
assertthat::assert_that(n_distinct(tools::file_ext(submitted_files)) == 1) | ||
assertthat::assert_that(unique(tools::file_ext(submitted_files)) == "tif", msg = "Error: The file extension in the directory is not 'tif'. Please check the file format.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we probably want a dictionary of valid site_id names (ideally this will come from some spatial vector table we give people with site_id
, date of fire/disturbance, and the polygon geometry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, added some documentation and set the function to enforce valid site_id and variable names:
https://github.com/eco4cast/modis-lai-forecast/blob/submission-workflow/R/spat4cast_submit.R#L5-L20
This is awesome! A few comments inline above. @rqthomas want to give this a look over? any opinions on the order of partitions in the hive labelling scheme? |
…ure and adding mode_id. adding roxygen documentation.
@cboettig I incorporated your suggestions. The hive-based file structure seems to be working as expected now, as is the move function (see the Do we need any logic around checking duplicates? Or allowing users to overwrite submissions? |
A reference_date partition is needed to prevent overwritting forecasts.
|
add reference date as @rqthomas suggests
See comments here: #12