Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: 📝 add start of the creating resources guide #810

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

signekb
Copy link
Member

@signekb signekb commented Oct 24, 2024

Description

This adds the guide on creating and managing data resources. It is not complete as there are some things I don't know how they may work from an implementation perspective, but this is a good starting place for us.

Closes #759

This PR needs an in-depth review.

@lwjohnst86 lwjohnst86 changed the title docs: 🚧 draft of creating and managing resources docs: 📝 added start of the creating resources guide Feb 18, 2025
@lwjohnst86 lwjohnst86 marked this pull request as ready for review February 18, 2025 09:08
@lwjohnst86 lwjohnst86 requested a review from a team as a code owner February 18, 2025 09:08
@signekb signekb changed the title docs: 📝 added start of the creating resources guide docs: 📝 add start of the creating resources guide Feb 19, 2025
Copy link
Member Author

@signekb signekb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍 Just some suggestions.
(bc I made the PR, I can't request changes or approve, only comment - so to show that I actually request changes, I have manually moved this PR to in progress on the board)

lwjohnst86 and others added 6 commits February 20, 2025 09:19
@signekb
Copy link
Member Author

signekb commented Feb 27, 2025

@lwjohnst86 Is this ready for review or there more changes on the way?

@lwjohnst86
Copy link
Member

@signekb yup!

@signekb
Copy link
Member Author

signekb commented Feb 28, 2025

I’ll start reviewing this now 🚀

Copy link
Member Author

@signekb signekb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice seeing this coming together. I have some suggestions and thoughts :)
(since I created this PR, I can only “Comment”)

Comment on lines 278 to 279
In this case, we don't want to add anything else, so we'll write the
text to the `README.md` file:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn’t this be a nice to show how this would be done, actually?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. Could you expand?

@signekb signekb mentioned this pull request Feb 28, 2025
2 tasks
lwjohnst86 pushed a commit that referenced this pull request Mar 4, 2025
…1099)

## Description

To fit the #810 and the current [naming
scheme](https://sprout.seedcase-project.org/docs/design/architecture/naming#actions).

<!-- Select quick/in-depth as necessary -->
This PR needs a quick review.

## Checklist

- [X] Added or updated tests
- [X] Ran `just run-all`
lwjohnst86 added a commit that referenced this pull request Mar 4, 2025
## Description

This PR adds a `path_readme()` which was referred to in a TODO item in
#810
This function will probably have to be rewritten a bit when we fully
switch to the “local-first” approach, but I thought it made sense to add
it now, so we don’t forget about it.

<!-- Select quick/in-depth as necessary -->
This PR needs a quick review.

## Checklist

- [X] Added or updated tests
- [X] Ran `just run-all`

---------

Co-authored-by: Luke W. Johnston <lwjohnst86@users.noreply.github.com>
@lwjohnst86
Copy link
Member

@signekb annoyingly, I wasn't notified of your comments (or maybe I missed them, not sure). I can't re-assign you to review, just letting you know I've updated things!

Copy link
Member Author

@signekb signekb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwjohnst86 Nice! Just some more comments (i.e., requested changes) :)

1. Create the properties for the resource, using the original raw data
as a starting point and edit as needed.
2. Create a folder to store the (processed) data resource in your
package, as well as having a folder for the (tidy) raw data.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package, as well as having a folder for the (tidy) raw data.
package, as well as having a folder for the (tidy) batch data.

Remove mentions of “raw data” within a resource.

@@ -22,6 +22,22 @@ that you have a record of the steps taken to clean and transform the
data.
:::

Putting your raw data into a data package makes it easier for yourself
and others to use later one. So the steps you'll take to get this raw
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and others to use later one. So the steps you'll take to get this raw
and others to use later one. So the steps you'll take to get your

Removing mentions of “raw data” within a resource.

Let's start with extracting the resource properties from the raw data.
While this function tries to infer the data types in the raw data, it might not get it right. So, be sure to check the properties after using this function. It can also not infer things that are not in the data itself, like a description of what the data contains or the unit of the data.
You'll start by creating the resource's properties. Before you can have
data stored in a data package, it needs metadata (called properties) on
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data stored in a data package, it needs metadata (called properties) on
data stored in a data package, it needs properties (i.e., metadata) on

I think it makes more sense to have them this way around so we consistently refer to properties and not metadata.


We've already create a package (using the steps from the [package
guide](packages.qmd)), with the path set as the variable `package_path`:
We assume you've already create a package (either by using the steps
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We assume you've already create a package (either by using the steps
We assume you've already created a package (either by using the steps

guide](packages.qmd)), with the path set as the variable `package_path`:
We assume you've already create a package (either by using the steps
from the [package guide](packages.qmd) or started making one for your
own data), with the path set as the variable `package_path`:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
own data), with the path set as the variable `package_path`:
own data), with the path to the data package set as the variable `package_path`:

this package, using the helper `path_resources()` function to give the
correct path to the resources folder. The default behaviour of
`path_resources()` is to use the current working directory, but for this
guide you'll have to use the `path` argument to point to where the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually “we’ll” here? Bc if they’re following along locally, they should be able to use the cwd, right?

Comment on lines +193 to +194
Next step is to set up the resource properties so that it gets checked
and saved into the `datapackage.json` file. You can use the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Next step is to set up the resource properties so that it gets checked
and saved into the `datapackage.json` file. You can use the
The next step is to add the resource properties to the `datapackage.json` file. Before they are added, they will be checked to confirm that they are in the correct shape and that no required fields are missing. You can use the

the first one in the package, so we can use `path_resource(1)`.
Next step is to set up the resource properties so that it gets checked
and saved into the `datapackage.json` file. You can use the
`path_properties()` helper function to always give you the correct
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`path_properties()` helper function to always give you the correct
path_properties()` helper function to give you the

I feel like “always” is promising too much — what if they give it the wrong path, for instance.

resource_properties = sp.create_resource_properties(
properties=resource_properties,
path=package_path / sp.path_resource(1)
# TODO: This function needs to be updated to write to data package.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: This function needs to be updated to write to data package.
# TODO: This function needs to be updated to write to datapackage.json

pprint(sp.read_properties(package_path / sp.path_properties()))
```

## Storing a backup of the raw data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw —> batch

@signekb signekb requested a review from lwjohnst86 March 11, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Guide on creating a resource
2 participants