diff --git a/.conda/meta.yaml b/.conda/meta.yaml index f67eb207..ad37cb80 100644 --- a/.conda/meta.yaml +++ b/.conda/meta.yaml @@ -22,7 +22,8 @@ requirements: - python >=3.9 - cftime - ecgtools>=2022.10.07 - - intake-dataframe-catalog>=0.1.1 + - intake>=0.7.0 + - intake-dataframe-catalog>=0.2.2 - intake-esm>=2023.4.20 - jsonschema - pooch diff --git a/.gitignore b/.gitignore index 68c2448a..da4e917f 100644 --- a/.gitignore +++ b/.gitignore @@ -130,3 +130,4 @@ dmypy.json sandpit.ipynb *.DS_Store +bin/build_all.sh.o* diff --git a/bin/build_all.sh b/bin/build_all.sh index ff413db4..b2887c61 100755 --- a/bin/build_all.sh +++ b/bin/build_all.sh @@ -9,32 +9,28 @@ #PBS -l wd #PBS -j oe -##################################################### +########################################################################################### # Copyright 2022 ACCESS-NRI and contributors. See the top-level COPYRIGHT file for details. # SPDX-License-Identifier: Apache-2.0 # Description: -# Update all intake catalogs from config files -# -##################################################### +# Generate access-nri intake metacatalog from config files + +########################################################################################### set -e +if [ ! $# -eq 0 ]; then + version=$1 +fi + conda activate access-nri-intake-dev OUTPUT_BASE_PATH=/g/data/tm70/intake CONFIG_DIR=/g/data/tm70/ds0092/projects/access-nri-intake-catalog/config -CONFIGS=( cmip6.yaml cmip5.yaml access-om2.yaml access-cm2.yaml access-esm1-5.yaml ) # erai.yaml - -# Get current version and set up the directories -version=$(python ../setup.py --version) -version_path=${OUTPUT_BASE_PATH}/v${version} -build_path=${version_path}/sources -mkdir ${version_path} -mkdir ${build_path} +CONFIGS=( cmip5.yaml cmip6.yaml access-om2.yaml access-cm2.yaml access-esm1-5.yaml ) # erai.yaml -metacatalog_file=${version_path}/metacatalog.csv config_paths=( "${CONFIGS[@]/#/${CONFIG_DIR}/}" ) -metacat-build --build_path=${build_path} --metacatalog_file=${metacatalog_file} ${config_paths[@]} +metacat-build --build_base_path=${OUTPUT_BASE_PATH} --version=${version} ${config_paths[@]} diff --git a/docs/development/contrib_code.rst b/docs/development/contrib_code.rst index c6b8dc04..250a4015 100644 --- a/docs/development/contrib_code.rst +++ b/docs/development/contrib_code.rst @@ -1,7 +1,8 @@ Contributing code ================= -Code contributions are handled through "pull requests" on GitHub. The following describes how to go about making your contributions and submitting a pull request. +Code contributions are handled through "pull requests" on GitHub. The following describes how to go about making your +contributions and submitting a pull request. #. Fork this respository. @@ -17,11 +18,13 @@ Code contributions are handled through "pull requests" on GitHub. The following $ conda env create -f environment-dev.yml $ conda activate access-nri-intake-dev -#. Install `access-nri-intake` using the editable flag (meaning any changes you make to the package will be reflected directly in your environment without having to reinstall):: +#. Install `access-nri-intake` using the editable flag (meaning any changes you make to the package will be reflected +directly in your environment without having to reinstall):: $ pip install --no-deps -e . -#. This project uses `black` to format code and `flake8` for linting. We use `pre-commit` to ensure these have been run. Please set up commit hooks by running the following. This will mean that `black` and `flake8` are run whenever you make a commit:: +#. This project uses `black` to format code and `flake8` for linting. We use `pre-commit` to ensure these have been run. +Please set up commit hooks by running the following. This will mean that `black` and `flake8` are run whenever you make a commit:: pre-commit install @@ -29,23 +32,50 @@ You can also run `pre-commit` manually at any point to format your code:: pre-commit run --all-files -#. Start making and committing your edits, including adding docstrings to functions and adding unit tests to check that your contributions are doing what they're suppose to. Please try to follow `numpydoc style `_ for docstrings. To run the test suite:: +#. Start making and committing your edits, including adding docstrings to functions and adding unit tests to check that +your contributions are doing what they're suppose to. Please try to follow `numpydoc style +`_ for docstrings. To run the test suite:: pytest src -#. Once you are happy with your contribution, navigate to `here `_ and open a new pull request to merge your branch of your fork with the main branch of the base. +#. Once you are happy with your contribution, navigate to `here `_ +and open a new pull request to merge your branch of your fork with the main branch of the base. Preparing a new release ----------------------- -New code releases to PyPI and conda are published automatically when a tag is pushed to Github. A corresponding version of the catalog files on Gadi must also be generated. To publish a new release:: +New releases to PyPI and conda are published automatically when a tag is pushed to Github. A new release may or may not include +an update to the catalog files on Gadi and associated +`data package `_ module :code:`access_nri_intake.cat`. If it does, +the person doing the release must ensure that the version of the new catalog matches the version of the new release by carefully +following all steps below. Ideally steps 1 and 2 below will be done in a PR and merged before commencing step 3. If the release +does not include an update to the catalog on Gadi, skip the first two steps below: + +#. [OPTIONAL] Create a new version of the catalog on Gadi (this will take about 45 mins):: $ export RELEASE=vX.X.X - $ # Create git tags + $ cd bin + $ qsub -v version=${RELEASE} build_all.sh + +#. [OPTIONAL] Upon successful completion of the previous step, the :code:`access_nri_intake` data package module will be updated + to point at the new version just created. Commit this update:: + + $ cd ../ + $ git add src/access_nri_intake/cat + $ git commit "Update catalog to $RELEASE" + +#. Go to https://github.com/ACCESS-NRI/access-nri-intake-catalog + +#. Click on "Releases"/"Draft new release" on the right-hand side of the screen + +#. Enter the new version (vX.X.X) as the tag and release title. Add a brief description of the release. + +#. Click on "Publish release". This should create the release on GitHub and trigger the workflow that builds and uploads + the new version to PyPI and conda + +Alternatively (any discouraged), one can trigger the new release from the command line. Replace steps 3 onwards with:: + + $ git fetch --all --tags $ git commit --allow-empty -m "Release $RELEASE" $ git tag -a $RELEASE -m "Version $RELEASE" - $ # Build the corresponding version of the catalog (make sure this job finishes successfully before progressing) - $ cd bin - $ qsub build_all.sh - $ # Push the tag to github to trigger the code release $ git push --tags diff --git a/docs/how_tos/building_intake-esm_catalogs.ipynb b/docs/how_tos/building_intake-esm_catalogs.ipynb index 0ceda0cb..fca917b2 100644 --- a/docs/how_tos/building_intake-esm_catalogs.ipynb +++ b/docs/how_tos/building_intake-esm_catalogs.ipynb @@ -5,9 +5,9 @@ "id": "cb836184-31c3-4113-a89d-d93548f8bded", "metadata": {}, "source": [ - "# Building your own intake-esm catalog\n", + "# Building your own intake-esm datastore\n", "\n", - "You've just run a new experiment, now you want to create an intake-esm catalog for that experiment. The `access_nri_intake` Python library provides a number of Builders for different types of model output in the `esmcat.builders` submodule. For example, here we'll import the `AccessOm2Builder` for building intake-esm catalogs for ACCESS-OM2 model output." + "You've just run a new experiment, now you want to create an intake-esm datastore for that experiment. The `access_nri_intake` Python library provides a number of Builders for different types of model output in the `esmcat.builders` submodule. For example, here we'll import the `AccessOm2Builder` for building intake-esm datastores for ACCESS-OM2 model output." ] }, { @@ -27,7 +27,7 @@ "id": "a9133e33-4706-4560-b44e-44b3d36ce780", "metadata": {}, "source": [ - "You can use this Builder to create an intake-esm catalog for your recently created data. For example, let's create an intake-esm catalog for the ACCESS-OM2 COSIMA experiment at `/g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126`." + "You can use this Builder to create an intake-esm datastore for your recently created data. For example, let's create an intake-esm datastore for the ACCESS-OM2 COSIMA experiment at `/g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126`." ] }, { @@ -42,21 +42,21 @@ "name": "stdout", "output_type": "stream", "text": [ - "Successfully wrote ESM catalog json file to: file:///home/599/ds0092/mycatalog.json\n", - "CPU times: user 2.75 s, sys: 1.34 s, total: 4.09 s\n", - "Wall time: 14.7 s\n" + "Successfully wrote ESM catalog json file to: file:///home/599/ds0092/mydatastore.json\n", + "CPU times: user 4.23 s, sys: 1.76 s, total: 5.99 s\n", + "Wall time: 19.6 s\n" ] } ], "source": [ "%%time\n", "\n", - "catalog_builder = AccessOm2Builder(\n", + "datastore_builder = AccessOm2Builder(\n", " path=\"/g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126\"\n", ").build()\n", "\n", - "catalog_builder.save(\n", - " name=\"mycatalog\", \n", + "datastore_builder.save(\n", + " name=\"mydatastore\", \n", " description=\"One sentence description of my experiment\", \n", ")" ] @@ -66,7 +66,7 @@ "id": "e01ccb09-1783-4856-9648-f8ddcd863a80", "metadata": {}, "source": [ - "Now you can use your intake-esm catalog to load and analyse your data" + "Now you can use your intake-esm datastore to load and analyse your data. You can read the intake-esm documentation [here](https://intake-esm.readthedocs.io/en/stable/index.html)." ] }, { @@ -78,7 +78,7 @@ "source": [ "import intake\n", "\n", - "cat = intake.open_esm_datastore(\"mycatalog.json\", columns_with_iterables=[\"variable\"])" + "cat = intake.open_esm_datastore(\"mydatastore.json\", columns_with_iterables=[\"variable\"])" ] }, { @@ -100,7 +100,7 @@ { "data": { "text/plain": [ - "" + "" ] }, "execution_count": 5, @@ -141,9 +141,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python (access-nri-intake-dev)", "language": "python", - "name": "python3" + "name": "access-nri-intake-dev" }, "language_info": { "codemirror_mode": { diff --git a/docs/how_tos/example_usage.ipynb b/docs/how_tos/example_usage.ipynb index e7220adc..36204f4e 100644 --- a/docs/how_tos/example_usage.ipynb +++ b/docs/how_tos/example_usage.ipynb @@ -7,12 +7,12 @@ "source": [ "# Example usage of the ACCESS-NRI catalog\n", "\n", - "The premise of the ACCESS-NRI catalog is to provide a (\"meta\") catalog of intake-esm (\"sub\") catalogs, which each correspond to different \"experiments\". \n", + "The premise of the ACCESS-NRI catalog is to provide a catalog of intake-esm datastores, which each correspond to different \"experiments\". Intake-esm datastores are often referred to themselves as intake-esm \"catalogs\" since they provide a a way to search a load data across a large number of assets (usually netcdf files) into distinct datasets. As such, the ACCESS-NRI catalog can be thought of as providing a catalog of subcatalogs.\n", "\n", "The idea is that users will:\n", - " - query on metadata shared across the different intake-esm subcatalogs to find the experiments that interest them\n", - " - open those subcatalogs (which may have different/additional metadata than the outer catalog)\n", - " - query further on the subcatalog(s) and eventually load some data\n", + " - query the ACCESS-NRI catalog on metadata shared across the different intake-esm datastores to find the experiments that interest them. For exmaple, these queries ask might questions like \"which experiments contain model X with variable Y at frequency Z?\".\n", + " - open those datastores (which may have different/additional metadata than the ACCESS-NRI catalog). \n", + " - potentially query further on the datastores(s) and eventually load some data.\n", "\n", "Examples are given below." ] @@ -45,7 +45,7 @@ }, "outputs": [], "source": [ - "metacat = intake.cat.access_nri" + "nri_cat = intake.cat.access_nri" ] }, { @@ -53,7 +53,7 @@ "id": "4e3e6a73-ab24-4b11-8f6e-56a18bef20da", "metadata": {}, "source": [ - "We now have ~3 PB of data at our fingertips" + "We now have ~3 PB of data at our fingertips. These data span a wide variety of model experiments. Each row in the catalog corresponds to a different experiment which may comprise many netcdf files. You can scroll through the experiments below:" ] }, { @@ -67,7 +67,7 @@ { "data": { "text/html": [ - "

access_nri catalog with 41 source(s) across 1990 rows:

\n", + "

access_nri catalog with 41 source(s) across 2003 rows:

\n", "