-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from nicholasbalasus/dev
merge dev into main
- Loading branch information
Showing
28 changed files
with
2,671 additions
and
3,221 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,15 @@ | ||
# Blended TROPOMI GOSAT Methane Product | ||
The entire project can be run by running `run.sh` (e.g., `sbatch -J bash -p seas_compute -t 3-00:00 --mem 1000 --wrap "bash run.sh" --output run.out`). The files that run, where they run, where the outputs are saved, and everything else is controlled by `config.yml`. All of the code that I have written is in `src/`, while code from others is in `tools/`. | ||
|
||
The project is broken into modules for downloading data, processing data, and writing data. Each of these modules is broken down below with approximations for their run times (run on the `serial_requeue` partion of Harvard's Cannon cluster), number of cores requested, and total amount of memory requested. Because I was using `serial_requeue`, the resources requested are large. These can be reduced in exchanged for longer run times. At the end of the project, the storage directory specificed in `config.yml` will be ~1.3 TB. After all of the modules have been run, `notebooks/paper.ipynb` can be run to make the figures. | ||
|
||
* Module 1: Download data | ||
* **Download_GOSAT**: download GOSAT level 2 data from UoL for 2018-2021 (~5 minutes, 1 core, 4 GB). | ||
* **Download_TROPOMI**: download TROPOMI level 2 data for 2018-2021 from the SRON ftp (~200 minutes, 8 cores, 32 GB). | ||
* **Download_TCCON**: download TCCON data from tccondata.org (~1 minute, 1 core, 4 GB). | ||
|
||
* Module 2: Process data | ||
* **Process_GOSAT**: process all daily netCDF GOSAT data to one dataframe (~210 minutes, 1 core, 8 GB). | ||
* **Process_TROPOMI**: process each netCDF TROPOMI file to a pickled dataframe (~100 minutes, 1024 cores, 3072 GB). | ||
* **Pair_TROPOMI_GOSAT**: pair TROPOMI and GOSAT measurements with time and distance thresholds specificed in `config.yml` (~1000 minutes, 1024 cores, 3072 GB). | ||
* **Process_TROPOMI_GOSAT_Pairs**: concatenate all pairs and calculate delta(TROPOMI-GOSAT) (~10 minutes, 1 core, 160 GB). | ||
* **Pair_GOSAT_TCCON**: make dataframes of GOSAT/TCCON pairs (with and without global GOSAT offset) for each TCCON site (~2 minutes, 25 cores, 160 GB). | ||
* **Run_FLAML**: train models to predict delta(TROPOMI-GOSAT) (~90 minutes, 8 cores, 64 GB). | ||
* **Predict_Delta_GOSAT_TROPOMI**: predict and remove delta(TROPOMI-GOSAT) from all TROPOMI data (~10 minutes, 1024 cores, 3072 GB). | ||
* **Pair_TROPOMI_TCCON**: make dataframes of TROPOMI/TCCON pairs and Blended/TCCON pairs for each TCCON site (~100 minutes, 50 cores, 250 GB). | ||
* **SHAP_Explainer**: make SHAP explainer and calculate shap values for train data (~320 minutes, 1 core, 64 GB). | ||
|
||
* Module 3: Write data | ||
* **Write_NetCDF**: write netCDF files that mimic the original TROPOMI data but add a variable for the blended product (~5 minutes, 512 cores, 1536 GB). | ||
* **Paired_Regrid**: regrid the TROPOMI and GOSAT pairs to a standard grid (~180 minute, 1 core, 128 GB). | ||
* **TROPOMI_Regrid**: regrid the TROPOMI data to a standard grid (~1100 minutes, 1 core, 200 GB). | ||
* **Oversample_TROPOMI**: for specific regions, oversample TROPOMI data to 0.01 degrees (~600 minutes, 4 cores, 1200 GB). | ||
# Blended TROPOMI+GOSAT Methane Product | ||
The entire project can be run by running `run.sh` (e.g., `sbatch -J bash -p huce_intel -t 14-00:00 --mem 32000 --wrap "bash run.sh" --output run.out`). The files that run, where they run, where the outputs are saved, and everything else is controlled by `config.yml`. All of the code that I have written is in `src/`, while code from others is in `tools/`. | ||
|
||
The project is broken into modules, each with approximations for their run times (run on the `huce_ice` partion of Harvard's Cannon cluster), number of cores requested, and total amount of memory requested. At the end of the project, the storage directory specificed in `config.yml` will be ~1 TB. After all of the modules have been run, `notebooks/paper.ipynb` can be run to make the figures. | ||
|
||
0. **Make_Conda_Env**: make or update the conda environment specified in `environment.yml`. | ||
1. **Download_GOSAT**: download GOSAT data from UoL for 2018-2021 (~5 minutes, 1 core, 4 GB). | ||
2. **Download_TROPOMI**: download operational TROPOMI data for 2018-2021 from the copernicus hub (~1.5 days, 8 cores, 32 GB). | ||
3. **Download_TCCON**: download TCCON data from tccondata.org (~1 minute, 1 core, 4 GB). | ||
4. **Calculate_Delta_GOSAT_TCCON**: for each TCCON station, find GOSAT pairs for TCCON observations and calculate delta(GOSAT-TCCON) (~3.5 hours, 25 cores, 500 GB). | ||
5. **Calculate_Delta_TROPOMI_GOSAT**: pair TROPOMI and GOSAT measurements and calculate delta(TROPOMI-GOSAT) (~4 days, 64 cores, 500 GB). | ||
6. **Run_FLAML_SHAP**: train models to predict delta(TROPOMI-GOSAT) then run SHAP (~4.5 hours, 8 cores, 64 GB). | ||
7. **Write_Blended_Files**: write netCDF files with an added variable for `methane_mixing_ratio_blended` (~1.5 hours, 16 jobs at 64 cores, 500 GB) | ||
8. **Calculate_Delta_TROPOMI_TCCON**: for each TCCON station, find TROPOMI pairs for TCCON observations and calculate delta(TROPOMI-TCCON) (~10 hours, 25 jobs at 64 cores, 500 GB). | ||
9. **Oversample_TROPOMI**: oversample the TROPOMI and Blended data to a 0.01 degree grid for 2021 (~8.5 hours, 2 cores, 500 GB). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,22 @@ | ||
RunDir: "/n/home06/nbalasus/blended_tropomi_gosat_methane" | ||
StorageDir: "/n/holylfs05/LABS/jacob_lab/nbalasus/blended_tropomi_gosat_methane" | ||
Partition: serial_requeue | ||
Cores: 16 | ||
CondaEnv: ch4_env | ||
Partition: huce_ice | ||
CondaEnv: ch4_env | ||
Debug: true | ||
|
||
TimeThreshold: 60 | ||
DistanceThreshold: 5 | ||
GlobalOffsetGOSAT: 8.9 | ||
TimeFLAML: 1800 | ||
GlobalOffsetGOSAT: 9.2 | ||
TimeFLAML: 3600 | ||
Model: lgbm | ||
a: 1.26 | ||
b: 0.13 | ||
a: 1.18 | ||
b: -0.40 | ||
|
||
Make_Conda_Env: true | ||
Download_GOSAT: true | ||
Download_TROPOMI: true | ||
Download_TCCON: true | ||
Process_GOSAT: true | ||
Process_TROPOMI: true | ||
Pair_TROPOMI_GOSAT: true | ||
Process_TROPOMI_GOSAT_Pairs: true | ||
Pair_GOSAT_TCCON: true | ||
Run_FLAML: true | ||
Predict_Delta_GOSAT_TROPOMI: true | ||
Pair_TROPOMI_TCCON: true | ||
SHAP_Explainer: true | ||
Write_NetCDF: true | ||
Paired_Regrid: true | ||
GOSAT_Regrid: true | ||
TROPOMI_Regrid: true | ||
Oversample_TROPOMI: true | ||
Calculate_Delta_GOSAT_TCCON: true | ||
Calculate_Delta_TROPOMI_GOSAT: true | ||
Run_FLAML_SHAP: true | ||
Write_Blended_Files: true | ||
Calculate_Delta_TROPOMI_TCCON: true | ||
Oversample_TROPOMI: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,23 @@ | ||
name: ch4_env | ||
name: blnd_env | ||
channels: | ||
- conda-forge | ||
- defaults | ||
dependencies: | ||
- python=3.9 | ||
- matplotlib=3.5.2 | ||
- numpy=1.21.5 | ||
- jupyterlab=3.4.0 | ||
- ipykernel=6.15.2 | ||
- matplotlib=3.7.1 | ||
- numpy=1.23.5 | ||
- jupyterlab=3.6.1 | ||
- ipykernel=6.19.2 | ||
- notebook=6.5.2 | ||
- jupyter=1.0.0 | ||
- netcdf4=1.5.8 | ||
- pandas=1.4.2 | ||
- cartopy=0.20.2 | ||
- pyproj=3.3.1 | ||
- scipy=1.7.3 | ||
- netcdf4=1.6.3 | ||
- pandas=1.5.3 | ||
- cartopy=0.21.1 | ||
- pyproj=3.4.1 | ||
- scipy=1.10.0 | ||
- shap=0.41.0 | ||
- geopy=2.2.0 | ||
- geopandas=0.10.2 | ||
- flaml=1.0.14 | ||
- pyyaml=6.0 | ||
- geopy=2.3.0 | ||
- geopandas=0.12.2 | ||
- flaml=1.1.3 | ||
- pyyaml=6.0 | ||
- scikit-learn=1.2.0 |
Oops, something went wrong.