Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download docs #166

Merged
merged 8 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog/166.docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added basic examples of how to download data from ESGF.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
For:

- different, pre-prepared views of the database, see [database views](database-views/index.md)
- usage as a data user, see [usage - data user](usage-data-user.md)
- usage as a data user, see [usage - data user](usage-data-user/index.md)
- usage as a data producer, see [usage - data producer](usage-data-producer.md)
- an overview of the repository, see [repository overview](repository-overview.md)
- guidance on how to contribute to the repository, see [contributing](contributing.md)
Expand Down
100 changes: 100 additions & 0 deletions docs/usage-data-user/downloading.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Downloading data

Here we provide some examples of how to download data.
They are not meant to be exhaustive, but they may help.

## Understanding what you need

Unfortunately, given the variety of models, approaches and requirements,
there is no uniform set of flags that can be used across all datasets
to 'just get the latest CMIP DECK data'.
Instead, please see the specific details of each dataset for details about what is available.
These pages include information about available grids and frequencies
and what to use for the pre-industrial control experiments.
[The datasets overview page is here](../dataset-overviews/index.md),
from which you can access the information page for each individual dataset.

## esgpull - commandline ESGF download software

It is possible to download datasets using [esgpull](https://esgf.github.io/esgf-download/).
The installation instructions are [here](https://esgf.github.io/esgf-download/installation/).

Having installed esgpull, make sure it is configured on your system with

```sh
esgpull self install
```

Then, we found that we had to set our data node correctly first
in order for esgpull to find input4MIPs data.

```sh
esgpull config api.index_node esgf-node.llnl.gov
```

Data can then be downloaded as shown below.
The key thing is to make sure that you are getting the source ID you are interested in.
(The below example uses the shell commands.
Obviously you can drive the shell in your programming language of choice,
which might be a more convenient option,
particularly if you require specific combinations of grids and variables.)

```sh
CMIP7_VERSION_PROJECT="input4MIPs"
# The MIP era will need to be changed to "CMIP7" when the final data is published
# (for now, all testing data is published under "CMIP6Plus").
CMIP7_VERSION_MIP_ERA="CMIP6Plus"
# The source ID you're interested in.
CMIP7_VERSION_SOURCE_ID="CR-CMIP-0-4-0"
SEARCH_TAG="cmip7-${CMIP7_VERSION_SOURCE_ID}"

esgpull add --tag ${SEARCH_TAG} --track project:${CMIP7_VERSION_PROJECT} mip_era:${CMIP7_VERSION_MIP_ERA} source_id:${CMIP7_VERSION_SOURCE_ID}

esgpull update -y --tag ${SEARCH_TAG}

# Be careful before running this, it may download a lot of data.
# See below for how to restrict the search info.
esgpull download --tag ${SEARCH_TAG}
```

If you want to only download data of a specific type,
you can do that too.
For example, to only download global-, annual-mean greenhouse gas concentrations,
you can add the below
(as discussed at the top of this page, knowing that this is possible is not obvious
and there is no uniform guidance that applies to all forcings, unfortunately).

```sh
GRID_LABEL="gn"
FREQUENCY="yr"
SEARCH_TAG="cmip7-${CMIP7_VERSION_SOURCE_ID}-${GRID_LABEL}-${FREQUENCY}"

esgpull add --tag ${SEARCH_TAG} --track project:${CMIP7_VERSION_PROJECT} mip_era:${CMIP7_VERSION_MIP_ERA} source_id:${CMIP7_VERSION_SOURCE_ID} grid_label:${GRID_LABEL} frequency:${FREQUENCY}

esgpull update -y --tag ${SEARCH_TAG}

esgpull download --tag ${SEARCH_TAG}
```

If you want to see this flow being used within a wider repository,
please see [https://github.com/climate-resource/CMIP6-vs-CMIP7-GHG-Concentrations]().

## Directly from ESGF

The ESGF MetaGrid search interface (see [https://aims2.llnl.gov/search/input4MIPs]())
provides direct access to the ESGF-hosted datasets.
This allows searching via search facets,
e.g., `MIP Era`, `Target MIP`
(where, for example, a target MIP of "CMIP" means the CMIP DECK activity, "ScenarioMIP" is the scenarios),
`Institution ID`, `Source ID`.

You can download directly by browsing down to file level.
Alternately, you can select a `wget` script which will download via the commandline
- and will select all available files that comprise a dataset
(a collection of files defined by a unique combination of the search facets).

By default, only the latest (non-deprecated, non-errata) data are listed in a search.
In addition, you can also limit your search to a local ESGF node,
for example `aims3.llnl.gov` or `esgf-data2.llnl.gov` on the West Coast of the USA,
or `esgf1.dkrz.de` in Hamburg, Germany.
We are expecting more ESGF nodes to begin replicating these data as the project matures.
43 changes: 34 additions & 9 deletions docs/usage-data-user.md → docs/usage-data-user/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,22 @@ so please don't hesitate to ask.
## The datasets

An overview of each dataset, with links to further information,
can be found in [dataset overviews](dataset-overviews/index.md).
can be found in [dataset overviews](../dataset-overviews/index.md).

## Navigating the database

The database tracks all of the files[^1] being managed in the input4MIPs project[^2].
In general, as a user, you won't be interested in information at the level of individual files,
hence we provide different views.
An overview of these is given in the
[database views homepage](database-views/index.md).
[database views homepage](../database-views/index.md).
Here we provide some more targeted guidance for users of the data.

[^1]: At the moment, just data from the 'CMIP6Plus' era, but we hope to expand this out to data from the CMIP6 era in future.
[^1]: At the moment, just data from the 'CMIP6Plus' era, but we hope to expand this out to data from the CMIP6 era in future, and early in 2025 toward the final datasets for CMIP7 usage.
[^2]: Yes, this does somewhat duplicate the point of the ESGF index, but the ESGF index isn't publically accessible/queriable and doesn't have all the information we want right now, so here we are.

If you want to know about the latest status of each dataset,
have a look at [the delivery summary](database-views/input4MIPs_delivery-summary_CMIP6Plus.html).
have a look at [the delivery summary](../database-views/input4MIPs_delivery-summary_CMIP6Plus.html).
This page provides, for each forcing dataset:

- its current status (see the `Status` column)
Expand All @@ -38,13 +38,13 @@ This page provides, for each forcing dataset:
this is provided as a hyperlink on the forcing dataset's name
(see the `Forcing dataset` column),
i.e. if you can click on the forcing dataset's name,
it will take you to that forcing dataset's home page.
it will take you to that forcing dataset's home page, where ever that is.

### How can I get more information about each dataset?

Beyond the overviews above, you can also use the different views of our database.
If you are interested in the status of different versions of a particular dataset,
then it is worth looking at [the source ID level view](database-views/input4MIPs_source-id_CMIP6Plus.html).
then it is worth looking at [the source ID level view](../database-views/input4MIPs_source-id_CMIP6Plus.html).
Within this view, the search bar can be used to filter just for the dataset you're interested in.
Once this filtering is done, a few columns are particularly relevant:

Expand All @@ -59,12 +59,37 @@ Once this filtering is done, a few columns are particularly relevant:
(either because it was never been published or because it has been retracted post-publication).

If you wish to dive even further, you can use
[the dataset level view](database-views/input4MIPs_datasets_CMIP6Plus.html)
[the dataset level view](../database-views/input4MIPs_datasets_CMIP6Plus.html)
to get information at the level of individual datasets (generally, variable-level)
and [the file level view](database-views/input4MIPs_files_CMIP6Plus.html)
and [the file level view](../database-views/input4MIPs_files_CMIP6Plus.html)
to get information at the level of individual files (unlikely to be of use in the majority of cases).

## CVs
### How can I provide feedback on an existing dataset?

We welcome comments and feedback on existing datasets, such analysis and review will uncover issues
that we are yet to identify, and once we know about an issue, it is far more likely it will get
attention and a resolution (and a shiny new dataset) will likely be generated to solve the issue.

To open a new issue, browse to the [input4MIPs_CVs issue page](https://github.com/PCMDI/input4MIPs_CVs/issues)
and open a new issue, preferably with a descriptive title that identifies the problem dataset by `source_id`
(e.g., PCMDI-AMIP-1-1-9). Alternatively, you can open a discussion [input4MIPs discussion page](https://github.com/PCMDI/input4MIPs_CVs/discussions)
to connect with the data providers, and other users about usage questions or other topics that don't warrant
an `issue` to be raised, yet - we can always convert a discussion to an issue if required.

To-date, we have already identified issues, and resolved problems with earlier versions of data,
and these previously existing issues are briefly described on the [input4MIPs source_id view](https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/latest/database-views/input4MIPs_source-id_CMIP6Plus.html)
search for `retracted` (top right hand search box), or some other criteria that caught issues
before data was made available on ESGF.

### How can I get informed when new data is published or errors with the data are identified?

You can "subscribe" to get notified of changes to content presented in this input4MIPs_CVs repository.
On the [homepage](https://github.com/PCMDI/input4MIPs_CVs/), there is a button titled "Watch", this
enables you to control the granularity of your notifications regarding changes to the repository
content. If you want more information, take a peek at the [Configuring Notifications](https://docs.github.com/en/account-and-profile/managing-subscriptions-and-notifications-on-github/setting-up-notifications/configuring-notifications)
page on GitHub.

## input4MIPs Controlled Vocabularies (CVs)

It is unlikely that you will need to use the CVs directly,
although they may be helpful for understanding what the different terms mean
Expand Down
4 changes: 3 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ repo_url: https://github.com/PCMDI/input4MIPs_CVs
nav:
- input4MIPs CVs: index.md
- Usage:
- As a data user: usage-data-user.md
- As a data user:
- Overview: usage-data-user/index.md
- Downloading data: usage-data-user/downloading.md
- As a data producer: usage-data-producer.md
- Dataset overviews:
- Summary: dataset-overviews/index.md
Expand Down
Loading