Skip to content

Getting started

Brendan Murphy edited this page Oct 8, 2024 · 6 revisions

Getting started with OpenGHG Inversions

Here is an overview of what OpenGHG Inversions does, and how to use it.

Table of Contents

  1. First steps
  2. What do you need to run an inversion?
    1. Data standardised by OpenGHG
    2. Data not standardised by OpenGHG
  3. How do you run an inversion?
    1. Using an inputs .ini file
    2. From the command line
    3. Example batch job script

First steps

OpenGHG inversions aims to estimate emissions of atmospheric gases using an inverse modelling framework. This is not a black box implementation. It requires the user to make choices on how best to set up their estimation method for the application. OpenGHG inversions facilitates the chosen set up and allows parameters to be changes with relative ease.

To get started, you will first have to make sure that openGHG inversion is installed and set up properly. This also requires an installation of the OpenGHG package. It is recommended that you first install OpenGHG before installing OpenGHG inversions.

See the conceptual workflow that describes a high level description of what steps are taken to run an inversion using openGHG inversions. It is also recommended that you familiarise yourself with the basic concepts involved with inverse modelling.

Once comfortable with the concepts, and the installation is complete and working, it's time to start getting the inputs ready for the inversion method.

What do you need to run an inversion?

See the section on the input files needed to run the inversion as a starting point to become familiar with what is needed to run the inversion.

Many of the input files needed will already have been created by somebody and, to get started, collaboration and file sharing is likely an easier approach than trying to start from scratch. Members of the ACRG at the University of Bristol and the contributors to the OpenGHG inversions package will be happy to help where they can.

Once the files have been created or acquired, the next step will be to prepare them for use with OpenGHG inversions.

Data standardised by OpenGHG

The OpenGHG package will be used to standardise many of the inputs into a common format. The following inputs must be standardised by OpenGHG and placed in an object store for the inversion to run. Click on the link for each input to learn a little more about it:

  • Observation data: these files are stored using the standardise_surface in OpenGHG. You can search to see if the data you need is already available using the search_surface function in OpenGHG.
  • Flux data: these files are stored using the standardise_flux function in OpenGHG. You can search to see if the data you need is already available using the search_flux function in OpenGHG.
  • Boundary condition data: these files are stored using the standardise_bc function in OpenGHG. You can search to see if the data you need is already available using the search_bc function in OpenGHG.
  • Footprints data: these files are stored using the standardise_footprint function in OpenGHG; You can search to see if the data you need is already available using the search_footprints function in OpenGHG (NOTE: footprint(s) is plural for search, singular for standardise).

Flux and boundary conditions data usually has a specific domain, but it is not specific to any one measurement site, so for multiple sites, you will usually use the same flux and boundary conditions.

Data not standardised by OpenGHG

Some of the input files needed for OpenGHG inversions will not be standardised by the OpenGHG code but will nevertheless have to be in a format recognised by OpenGHG inversoins.

  • Country mask: this files defines the regions or country for which you would like to calculate the total emissions. See the information here for more on what this contains and how to create a new country mask file.
  • Basis functions
    • Boundary Condition Basis Functions: this usually means a netCDF file with latitude and longitude coordinates, with integer values indicating the location of each boundary condition. See here for more information.
    • (optional) A basis function file. This is usually made on the fly during an inversion. Currently, the most information on using an existing basis function file is here.

How do you run an inversion?

Using an inputs .ini file

The easiest way to run the inversion is to update an existing template inputs file. Adapt the .ini file based on your own needs. Some information on the various parameters is contained within the .ini file, but a more complete description is given on the Inputs file page on this wiki. Note that only the inputs being used need to be contained in the .ini file, and those which are not being used can be removed from the file (unless indicated that it is required).

From the command line

To run an inversion from the command line, activate your conda or virtual environment in which OpenGHG inversions is installed, and from the command line you can run,

python <path/to/openghg_inversions>/openghg_inversions/hbmcmc/run_hbmcmc.py -c <yourinputsfile>.ini

where you will replace the bits in < > with your own paths and file names.

Example batch job script

Alternatively, you may wish to run the inversion using a high performance computer (HPC). Below is an example script that can be used to run an inversion on an HPC using Slurm (a job scheduling system). The submission method can differ between systems and the below is simply the example used on one system. You will have to adapt it for your own needs.

This script assumes that you have already created a conda environment called pymc_env and you have created your .ini file. Again, you will replace the bits in < > with your own paths and file names.

#!/bin/sh
# ****************************************************************************
# Wrapper script for submitting jobs on ACRC HPC
# docs: https://www.acrc.bris.ac.uk/protected/hpc-docs/index.html
# ****************************************************************************
#SBATCH --job-name=my_inv
#SBATCH --output=openghg_inversions.out
#SBATCH --error=openghg_inversions.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=04:00:00
#SBATCH --mem=30gb
#SBATCH --account=dept123456


# Set up Python environment
module purge # optional: clear modules
module load git
module load languages/python
# load any other modules you would have on the login node
eval "$(conda shell.bash hook)"
conda activate pymc_env  # replace with your environment name

#conda info

# run inversion script
INI_FILE=<path/to/input/file>/<yourinputsfile>.ini
python <path/to/openghg_inversions>/openghg_inversions/hbmcmc/run_hbmcmc.py -c $INI_FILE

If this script is saved as my_inversions_script.sh, you would run it with sbatch my_inversion_script.sh.