diff --git a/docs/datastores/quickstart.ipynb b/docs/datastores/quickstart.ipynb index 7b2e048c..03710350 100644 --- a/docs/datastores/quickstart.ipynb +++ b/docs/datastores/quickstart.ipynb @@ -32,12 +32,353 @@ "warnings.filterwarnings(\"ignore\") # Suppress warnings for these docs" ] }, + { + "cell_type": "markdown", + "id": "e62614a5", + "metadata": {}, + "source": [ + "# Building an Intake-ESM datastore - the quick way\n", + "\n", + "As of `access_nri_intake` version 1.1.0, it is possible to build an ESM-datastore from the command line, using the `build-esm-datastore` utility.\n", + "\n", + "Usage is as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c9ed0d1", + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "user@local_machine $ ssh gadi \n", + "user@gadi $ mkdir catalog_dir && cd catalog_dir # Change catalog_dir to your desired directory\n", + "user@gadi $ module load conda/analysis3\n", + "user@gadi $ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir ." + ] + }, + { + "cell_type": "markdown", + "id": "ba0fa016", + "metadata": {}, + "source": [ + "This will create a new Intake-ESM catalog in the `catalog_dir` directory, using the `Mom6Builder` builder, and the experiment directory `/g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/`.\n", + "\n", + "The first time you run `build-esm-datastore`, you can expect to see some output like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "349c9147", + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir .\n", + "Generating esm-datastore for /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2\n", + "Building esm-datastore...\n", + "/home/189/ct1163/catalog_dir/venv/lib/python3.11/site-packages/access_nri_intake/source/utils.py:140: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.\n", + " warnings.warn(\n", + "...\n", + "Sucessfully built esm-datastore!\n", + "Saving esm-datastore to /home/189/ct1163/catalog_dir\n", + "/home/189/ct1163/catalog_dir/venv/lib/python3.11/site-packages/intake_esm/cat.py:186: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/\n", + " data = self.dict().copy()\n", + "Successfully wrote ESM catalog json file to: file:///home/189/ct1163/catalog_dir/experiment_datastore.json\n", + "Hashing catalog to prevent unnecessary rebuilds.\n", + "This may take some time...\n", + "Catalog sucessfully hashed!\n", + "Datastore sucessfully written to /home/189/ct1163/catalog_dir/experiment_datastore.json!\n", + "Please note that this has not added the datastore to the access-nri-intake catalog.\n", + "To add to catalog, please run 'scaffold-catalog-entry' for help on how to do so.\n", + "To open the datastore, run `intake.open_esm_datastore('/home/189/ct1163/catalog_dir/experiment_datastore.json', columns_with_iterables=['variable'])` in a Python session.\n", + "$\n" + ] + }, + { + "cell_type": "markdown", + "id": "97db5843", + "metadata": {}, + "source": [ + "If you rerun `build-esm-datastore`, you can expect to see something like this if the tool detects a valid & current datastore in the specified directory:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6117aaa", + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir .\n", + "Datastore found in current directory, verifying datastore integrity...\n", + "Parsing experiment dir...\n", + "Datastore integrity verified!\n", + "Datastore found in /home/189/ct1163/catalog_dir/experiment_datastore.json!\n", + "Please note that this has not added the datastore to the access-nri-intake catalog.\n", + "To add to catalog, please run 'scaffold-catalog-entry' for help on how to do so.\n", + "To open the datastore, run `intake.open_esm_datastore('/home/189/ct1163/catalog_dir/experiment_datastore.json', columns_with_iterables=['variable'])` in a Python session.\n", + "$" + ] + }, + { + "cell_type": "markdown", + "id": "eeee704d", + "metadata": {}, + "source": [ + "...or this if the tool detects that the datastore is out of date, and needs to be regenerated:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da34d1f1", + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir .\n", + "Datastore found in current directory, verifying datastore integrity...\n", + "Parsing experiment dir...\n", + "Experiment directory and datastore do not match (missing files from datastore). Datastore regeneration required...\n", + "Building esm-datastore...\n", + "...\n", + "Sucessfully built esm-datastore!\n", + "Saving esm-datastore to /home/189/ct1163/catalog_dir\n", + "/home/189/ct1163/catalog_dir/venv/lib/python3.11/site-packages/intake_esm/cat.py:186: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/\n", + " data = self.dict().copy()\n", + "Successfully wrote ESM catalog json file to: file:///home/189/ct1163/catalog_dir/experiment_datastore.json\n", + "Hashing catalog to prevent unnecessary rebuilds.\n", + "This may take some time...\n", + "Catalog sucessfully hashed!\n", + "Datastore sucessfully written to /home/189/ct1163/catalog_dir/experiment_datastore.json!\n", + "Please note that this has not added the datastore to the access-nri-intake catalog.\n", + "To add to catalog, please run 'scaffold-catalog-entry' for help on how to do so.\n", + "To open the datastore, run `intake.open_esm_datastore('/home/189/ct1163/catalog_dir/experiment_datastore.json', columns_with_iterables=['variable'])` in a Python session." + ] + }, + { + "cell_type": "markdown", + "id": "a591b4e0", + "metadata": {}, + "source": [ + "To see the full list of options, run `build-esm-datastore --help`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bdb56801", + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "$ build-esm-datastore --help\n", + "usage: build-esm-datastore [-h] [--builder BUILDER] [--builder-kwargs [BUILDER_KWARGS ...]] [--expt-dir EXPT_DIR]\n", + " [--cat-dir CAT_DIR] [--datastore-name DATASTORE_NAME] [--description DESCRIPTION]\n", + "\n", + "Build an esm-datastore by inspecting a directory containing model outputs. If no datastore exists, a new one will be\n", + "created. If a datastore exists, it's integrity will be verified, and the datastore regenerated if necessary.\n", + "\n", + "options:\n", + " -h, --help show this help message and exit\n", + " --builder BUILDER Builder to use to create the esm-datastore. Builders are defined the source.builders module.\n", + " Currently available options are: AccessOm2Builder, AccessOm3Builder, Mom6Builder,\n", + " AccessEsm15Builder, AccessCm2Builder. To build a datastore for a new model, please contact the\n", + " ACCESS-NRI team.\n", + " --builder-kwargs [BUILDER_KWARGS ...]\n", + " Additional keyword arguments to pass to the builder. Should be in the form of key=value.\n", + " --expt-dir EXPT_DIR Directory containing the model outputs to be added to the esm-datastore. Defaults to the\n", + " current working directory. Although builders support adding multiple directories, this tool\n", + " only supports one directory at a time - at present.\n", + " --cat-dir CAT_DIR Directory in which to place the catalog.json file. Defaults to the value of --expt-dir if not\n", + " set.\n", + " --datastore-name DATASTORE_NAME\n", + " Name of the datastore to use. If not provided, this will default to 'experiment_datastore'.\n", + " --description DESCRIPTION\n", + " Description of the datastore. If not provided, a default description will be used:\n", + " 'esm_datastore for the model output in {--expt-dir}'" + ] + }, + { + "cell_type": "markdown", + "id": "db1869d1", + "metadata": {}, + "source": [ + "If you want to place multiple datastores in the same directory, you will need to specify different datastore names, using the `--datastore-name` option. For example:\n", + "\n", + "```bash\n", + "$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/ --cat-dir . --datastore-name mom6_panant_01\n", + "...\n", + "$ build-esm-datastore --builder Mom6Builder --expt-dir /g/data/ik11/outputs/mom6-panan/panant-02-zstar-ACCESSyr2/ --cat-dir . --datastore-name mom6_panant_02\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "id": "c8290d6d", + "metadata": {}, + "source": [ + "In addition, you can access the `build-esm-datastore` functionality from within a python script, using the `use_datastore` function:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "5de825b9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[34m\u001b[22mDatastore found in \u001b[36m\u001b[1m/home/189/ct1163/catalog_dir\u001b[34m\u001b[22m, verifying datastore integrity...\u001b[0m\n", + "\u001b[34m\u001b[22mParsing experiment dir...\u001b[0m\n", + "\u001b[32m\u001b[22mDatastore integrity verified!\u001b[0m\n", + "\u001b[32m\u001b[22mDatastore found in \u001b[36m\u001b[1m/home/189/ct1163/catalog_dir/experiment_datastore.json\u001b[32m\u001b[22m!\n", + "\u001b[34m\u001b[22mPlease note that this has not added the datastore to the access-nri-intake catalog.\n", + "To add to catalog, please run '\u001b[37m\u001b[1mscaffold_catalog_entry\u001b[34m\u001b[22m' for help on how to do so.\n" + ] + }, + { + "data": { + "text/html": [ + "
experiment_datastore catalog with 13 dataset(s) from 12325 asset(s):
\n", + " | unique | \n", + "
---|---|
filename | \n", + "12325 | \n", + "
file_id | \n", + "13 | \n", + "
path | \n", + "12325 | \n", + "
filename_timestamp | \n", + "82 | \n", + "
frequency | \n", + "3 | \n", + "
start_date | \n", + "3977 | \n", + "
end_date | \n", + "3978 | \n", + "
variable | \n", + "122 | \n", + "
variable_long_name | \n", + "17 | \n", + "
variable_standard_name | \n", + "17 | \n", + "
variable_cell_methods | \n", + "17 | \n", + "
variable_units | \n", + "17 | \n", + "
realm | \n", + "2 | \n", + "
derived_variable | \n", + "0 | \n", + "