Skip to content

Commit

Permalink
326/195 documentation (#327)
Browse files Browse the repository at this point in the history
* Update quickstart guide & fix apostrophe error

* Update quickstart guide

* Check for str & convert to path obj if needed in use_datastore

* Update datastore quickstart

* Pre-commit

* Updated test to cover string input

* Include info about additional options added to `build-esm-datastore`

* Update notebook - render state, no actual code changes

* Some minor documentation tweaks (also, running the doc build appears to have updated the project and storage flag lists)

* path conversions fixup

* Type hint

---------

Co-authored-by: Marc White <mwhite1206@gmail.com>
  • Loading branch information
charles-turner-1 and marc-white authored Feb 7, 2025
1 parent f331cb4 commit 91068c6
Show file tree
Hide file tree
Showing 7 changed files with 382 additions and 23 deletions.
349 changes: 345 additions & 4 deletions docs/datastores/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,361 @@
"warnings.filterwarnings(\"ignore\") # Suppress warnings for these docs"
]
},
{
"cell_type": "markdown",
"id": "e62614a5",
"metadata": {},
"source": [
"# Building an Intake-ESM datastore - the quick way\n",
"\n",
"As of `access_nri_intake` version 1.1.0, it is possible to build an ESM-datastore from the command line, using the `build-esm-datastore` utility.\n",
"\n",
"Usage is as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c9ed0d1",
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"user@local_machine $ ssh gadi \n",
"user@gadi $ mkdir catalog_dir && cd catalog_dir # Change catalog_dir to your desired directory\n",
"user@gadi $ module load conda/analysis3\n",
"user@gadi $ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir ."
]
},
{
"cell_type": "markdown",
"id": "ba0fa016",
"metadata": {},
"source": [
"This will create a new Intake-ESM catalog in the `catalog_dir` directory, using the `Mom6Builder` builder, and the experiment directory `/g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/`.\n",
"\n",
"The first time you run `build-esm-datastore`, you can expect to see some output like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "349c9147",
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir .\n",
"Generating esm-datastore for /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2\n",
"Building esm-datastore...\n",
"/home/189/ct1163/catalog_dir/venv/lib/python3.11/site-packages/access_nri_intake/source/utils.py:140: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.\n",
" warnings.warn(\n",
"...\n",
"Sucessfully built esm-datastore!\n",
"Saving esm-datastore to /home/189/ct1163/catalog_dir\n",
"/home/189/ct1163/catalog_dir/venv/lib/python3.11/site-packages/intake_esm/cat.py:186: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/\n",
" data = self.dict().copy()\n",
"Successfully wrote ESM catalog json file to: file:///home/189/ct1163/catalog_dir/experiment_datastore.json\n",
"Hashing catalog to prevent unnecessary rebuilds.\n",
"This may take some time...\n",
"Catalog sucessfully hashed!\n",
"Datastore sucessfully written to /home/189/ct1163/catalog_dir/experiment_datastore.json!\n",
"Please note that this has not added the datastore to the access-nri-intake catalog.\n",
"To add to catalog, please run 'scaffold-catalog-entry' for help on how to do so.\n",
"To open the datastore, run `intake.open_esm_datastore('/home/189/ct1163/catalog_dir/experiment_datastore.json', columns_with_iterables=['variable'])` in a Python session.\n",
"$\n"
]
},
{
"cell_type": "markdown",
"id": "97db5843",
"metadata": {},
"source": [
"If you rerun `build-esm-datastore`, you can expect to see something like this if the tool detects a valid & current datastore in the specified directory:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6117aaa",
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir .\n",
"Datastore found in current directory, verifying datastore integrity...\n",
"Parsing experiment dir...\n",
"Datastore integrity verified!\n",
"Datastore found in /home/189/ct1163/catalog_dir/experiment_datastore.json!\n",
"Please note that this has not added the datastore to the access-nri-intake catalog.\n",
"To add to catalog, please run 'scaffold-catalog-entry' for help on how to do so.\n",
"To open the datastore, run `intake.open_esm_datastore('/home/189/ct1163/catalog_dir/experiment_datastore.json', columns_with_iterables=['variable'])` in a Python session.\n",
"$"
]
},
{
"cell_type": "markdown",
"id": "eeee704d",
"metadata": {},
"source": [
"...or this if the tool detects that the datastore is out of date, and needs to be regenerated:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da34d1f1",
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir .\n",
"Datastore found in current directory, verifying datastore integrity...\n",
"Parsing experiment dir...\n",
"Experiment directory and datastore do not match (missing files from datastore). Datastore regeneration required...\n",
"Building esm-datastore...\n",
"...\n",
"Sucessfully built esm-datastore!\n",
"Saving esm-datastore to /home/189/ct1163/catalog_dir\n",
"/home/189/ct1163/catalog_dir/venv/lib/python3.11/site-packages/intake_esm/cat.py:186: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/\n",
" data = self.dict().copy()\n",
"Successfully wrote ESM catalog json file to: file:///home/189/ct1163/catalog_dir/experiment_datastore.json\n",
"Hashing catalog to prevent unnecessary rebuilds.\n",
"This may take some time...\n",
"Catalog sucessfully hashed!\n",
"Datastore sucessfully written to /home/189/ct1163/catalog_dir/experiment_datastore.json!\n",
"Please note that this has not added the datastore to the access-nri-intake catalog.\n",
"To add to catalog, please run 'scaffold-catalog-entry' for help on how to do so.\n",
"To open the datastore, run `intake.open_esm_datastore('/home/189/ct1163/catalog_dir/experiment_datastore.json', columns_with_iterables=['variable'])` in a Python session."
]
},
{
"cell_type": "markdown",
"id": "a591b4e0",
"metadata": {},
"source": [
"To see the full list of options, run `build-esm-datastore --help`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bdb56801",
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"$ build-esm-datastore --help\n",
"usage: build-esm-datastore [-h] [--builder BUILDER] [--builder-kwargs [BUILDER_KWARGS ...]] [--expt-dir EXPT_DIR]\n",
" [--cat-dir CAT_DIR] [--datastore-name DATASTORE_NAME] [--description DESCRIPTION]\n",
"\n",
"Build an esm-datastore by inspecting a directory containing model outputs. If no datastore exists, a new one will be\n",
"created. If a datastore exists, it's integrity will be verified, and the datastore regenerated if necessary.\n",
"\n",
"options:\n",
" -h, --help show this help message and exit\n",
" --builder BUILDER Builder to use to create the esm-datastore. Builders are defined the source.builders module.\n",
" Currently available options are: AccessOm2Builder, AccessOm3Builder, Mom6Builder,\n",
" AccessEsm15Builder, AccessCm2Builder. To build a datastore for a new model, please contact the\n",
" ACCESS-NRI team.\n",
" --builder-kwargs [BUILDER_KWARGS ...]\n",
" Additional keyword arguments to pass to the builder. Should be in the form of key=value.\n",
" --expt-dir EXPT_DIR Directory containing the model outputs to be added to the esm-datastore. Defaults to the\n",
" current working directory. Although builders support adding multiple directories, this tool\n",
" only supports one directory at a time - at present.\n",
" --cat-dir CAT_DIR Directory in which to place the catalog.json file. Defaults to the value of --expt-dir if not\n",
" set.\n",
" --datastore-name DATASTORE_NAME\n",
" Name of the datastore to use. If not provided, this will default to 'experiment_datastore'.\n",
" --description DESCRIPTION\n",
" Description of the datastore. If not provided, a default description will be used:\n",
" 'esm_datastore for the model output in {--expt-dir}'"
]
},
{
"cell_type": "markdown",
"id": "db1869d1",
"metadata": {},
"source": [
"If you want to place multiple datastores in the same directory, you will need to specify different datastore names, using the `--datastore-name` option. For example:\n",
"\n",
"```bash\n",
"$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir . --datastore-name mom6_panant_01\n",
"...\n",
"$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-02-zstar-ACCESSyr2/ --cat-dir . --datastore-name mom6_panant_02\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "c8290d6d",
"metadata": {},
"source": [
"In addition, you can access the `build-esm-datastore` functionality from within a python script, using the `use_datastore` function:\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5de825b9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[22mDatastore found in \u001b[36m\u001b[1m/home/189/ct1163/catalog_dir\u001b[34m\u001b[22m, verifying datastore integrity...\u001b[0m\n",
"\u001b[34m\u001b[22mParsing experiment dir...\u001b[0m\n",
"\u001b[32m\u001b[22mDatastore integrity verified!\u001b[0m\n",
"\u001b[32m\u001b[22mDatastore found in \u001b[36m\u001b[1m/home/189/ct1163/catalog_dir/experiment_datastore.json\u001b[32m\u001b[22m!\n",
"\u001b[34m\u001b[22mPlease note that this has not added the datastore to the access-nri-intake catalog.\n",
"To add to catalog, please run '\u001b[37m\u001b[1mscaffold_catalog_entry\u001b[34m\u001b[22m' for help on how to do so.\n"
]
},
{
"data": {
"text/html": [
"<p><strong>experiment_datastore catalog with 13 dataset(s) from 12325 asset(s)</strong>:</p> <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>unique</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>filename</th>\n",
" <td>12325</td>\n",
" </tr>\n",
" <tr>\n",
" <th>file_id</th>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>path</th>\n",
" <td>12325</td>\n",
" </tr>\n",
" <tr>\n",
" <th>filename_timestamp</th>\n",
" <td>82</td>\n",
" </tr>\n",
" <tr>\n",
" <th>frequency</th>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>start_date</th>\n",
" <td>3977</td>\n",
" </tr>\n",
" <tr>\n",
" <th>end_date</th>\n",
" <td>3978</td>\n",
" </tr>\n",
" <tr>\n",
" <th>variable</th>\n",
" <td>122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>variable_long_name</th>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>variable_standard_name</th>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>variable_cell_methods</th>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>variable_units</th>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>realm</th>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>derived_variable</th>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from access_nri_intake.experiment import use_datastore\n",
"from access_nri_intake.source.builders import Mom6Builder\n",
"\n",
"ds = use_datastore(\n",
" experiment_dir=\"/g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/\",\n",
" catalog_dir=\"/home/189/ct1163/catalog_dir/\",\n",
" builder=Mom6Builder,\n",
" datastore_name=\"experiment_datastore\",\n",
" description=\"PanAnt experiment with ACCESS-OM2-01 forcing\",\n",
" )\n",
"ds"
]
},
{
"cell_type": "markdown",
"id": "31c9ac00",
"metadata": {},
"source": [
"For even more fine grained control, follow the guide below:"
]
},
{
"cell_type": "markdown",
"id": "c1526d2b-06b8-46e3-9005-638c04844c6e",
"metadata": {},
"source": [
"## Building an Intake-ESM datastore"
"## Building an Intake-ESM datastore - using builders directly"
]
},
{
"cell_type": "markdown",
"id": "9f8f5cd3-93bf-4612-afc9-54ac6b2ce516",
"metadata": {},
"source": [
"In this tutorial, we'll build an Intake-ESM datastore for an ACCESS-OM2 model run that is currently not included in the ACCESS-NRI catalog. The base output directory for this model run is:\n",
"In the rest of this tutorial, we'll build an Intake-ESM datastore for an ACCESS-OM2 model run that is currently not included in the ACCESS-NRI catalog. The base output directory for this model run is:\n",
"\n",
"`/g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126`\n",
"\n",
Expand Down Expand Up @@ -463,7 +804,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -477,7 +818,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.11.9"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 91068c6

Please sign in to comment.