Skip to content

Commit

Permalink
Merge branch 'main' into conus404_data (lots of changes on main)
Browse files Browse the repository at this point in the history
  • Loading branch information
sethmcg committed Oct 17, 2024
2 parents eb6f0e5 + 7357df0 commit 12fc897
Show file tree
Hide file tree
Showing 124 changed files with 21,304 additions and 8,913 deletions.
20 changes: 9 additions & 11 deletions .github/workflows/python-package-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,10 @@ jobs:
max-parallel: 5

steps:
- uses: actions/checkout@v2
- uses: conda-incubator/setup-miniconda@v2
- uses: actions/checkout@v4
- uses: mamba-org/setup-micromamba@v1
with:
mamba-version: "*"
channel-priority: true
environment-file: environment.yml
auto-activate-base: false
activate-environment: test
environment-file: environment_cpu.yml
- shell: bash -l {0}
run: |
conda info
Expand All @@ -27,13 +23,15 @@ jobs:
- name: Lint with flake8
shell: bash -l {0}
run: |
mamba install flake8
micromamba install ruff
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
ruff check --select=E9,F63,F7,F82
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
ruff check --output-format concise --exit-zero
# Checking documentation errors
ruff check --select=D --exit-zero --statistics
- name: Test with pytest
shell: bash -l {0}
run: |
mamba install pytest
micromamba install pytest
pytest
51 changes: 38 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,36 +5,61 @@ CREDIT is a package to train and run neural networks
that can emulate full NWP models by predicting
the next state of the atmosphere given the current state.

## Installation
## NSF-NCAR Derecho Installation
Currently, the framework for running miles-credit in parallel is centered around NSF-NCAR's Derecho HPC. Derecho requires building several miles-credit dependent packages locally, including PyTorch, to enable correct MPI configuration. To begin, create a clone of the pre-built miles-credit environment, which contains compatiable versions of torch, torch-vision, numpy, and others.

```bash
module purge
module load ncarenv/23.09 gcc/12.2.0 ncarcompilers cray-mpich/8.1.27 cuda/12.2.1 cudnn/8.8.1.3-12 conda/latest
conda create --name credit-derecho --clone /glade/derecho/scratch/benkirk/derecho-pytorch-mpi/envs/credit-pytorch-v2.3.1-derecho-gcc-12.2.0-cray-mpich-8.1.27
```

Going forward, care must be taken when installing new packages so that PyTorch and the other relevant miles-credit dependencies are not overridden. Next, grab the most updated version of miles-credit from github (assuming no changes to the local-build dependencies):

```bash
conda activate credit-derecho
git clone git@github.com:NCAR/miles-credit.git
cd miles-credit
```

and then install without dependencies by

```bash
pip install --no-deps .
```

Henceforth, when adding new packages aim to use the no dependenices option.

## Standard Installation
Clone from miles-credit github page:
```bash
git clone git@github.com:NCAR/miles-credit.git
cd miles-credit
```

Install dependencies using environment.yml file:
Install dependencies using environment_gpu.yml file (also compatible with CPU-only machines):

Note: if you are on NCAR HPC, we recommend installing to your home directory. To do this, simply append `-p /glade/u/home/$USER/[your_install_dir]/` to the `conda/mamba env create` command below:

```bash
mamba env create -f environment.yml
mamba env create -f environment_gpu.yml
conda activate credit
```

CPU-only install:
```bash
mamba env create -f environment_cpu.yml
conda activate credit
```


Some metrics use WeatherBench2 for computation. Install with:
```bash
git clone git@github.com:google-research/weatherbench2.git
cd weatherbench2
pip install .
````

To enable GPU support, install pytorch-cuda:
```bash
mamba install pytorch-cuda=12.1 -c pytorch -c nvidia
```

Install miles-credit with the following command:
```bash
pip install .
```

## Train a Segmentation Model (like a U-Net)
```bash
python applications/train.py -c config/unet.yml
Expand Down
30 changes: 16 additions & 14 deletions applications/calc_global_solar.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,18 @@ def main():
parser.add_argument('-s', '--start', type=str, default="2000-01-01", help="Start date (inclusive)")
parser.add_argument('-e', '--end', type=str, default="2000-12-31 23:00", help="End date (inclusive")
parser.add_argument('-t', '--step', type=str, default="1h", help="Step frequency")
parser.add_argument('-u', '--sub', type=str, default="5Min", help="Sub Frequency")
parser.add_argument('-u', '--sub', type=str, default="1Min", help="Sub Frequency")
parser.add_argument('-i', '--input', type=str,
default="/glade/u/home/wchapman/MLWPS/DataLoader/static_variables_ERA5_zhght.nc",
default="/glade/u/home/wchapman/MLWPS/DataLoader/LSM_static_variables_ERA5_zhght.nc",
help="File containing longitudes, latitudes, and geopotential height.")
parser.add_argument("-g", "--geo", type=str, default="Z_GDS4_SFC",
help="Geopotential height variable.")
parser.add_argument("-o", "--output", type=str, required=True, help="Output directory")
args = parser.parse_args()
grid_points_sub = None
start_date_ts = pd.Timestamp(args.start)
end_date_ts = pd.Timestamp(args.end)
if rank == 0:
start_date_ts = pd.Timestamp(args.start)
end_date_ts = pd.Timestamp(args.end)
dates = pd.date_range(start=start_date_ts,
end=end_date_ts, freq=args.step)
with xr.open_dataset(args.input) as static_ds:
Expand All @@ -41,25 +41,23 @@ def main():
)
heights = static_ds[args.geo].values / 9.81
grid_points = np.vstack([lon_grid.ravel(), lat_grid.ravel(), heights.ravel()]).T
print(grid_points.shape)
split_indices = np.round(np.linspace(0, grid_points.shape[0], size + 1)).astype(int)
print(split_indices)
split_indices = np.round(np.linspace(0, grid_points.shape[0], size + 1)).astype(int)
grid_points_sub = [grid_points[split_indices[s]:split_indices[s+1]] for s in range(split_indices.size - 1)]
rank_points = comm.scatter(grid_points_sub, root=0)
print(rank_points.shape)
for r, rank_point in enumerate(rank_points):
if r % 10 == 0:
print(rank, rank_point, r, rank_points.shape[0])
solar_point = get_solar_radiation_loc(rank_point[0], rank_point[1], rank_point[2],
args.start, args.end, step_freq=args.step, sub_freq=args.sub)
args.start, args.end, step_freq=args.step, sub_freq=args.sub)
if rank > 0:
comm.Send(np.concatenate([solar_point["latitude"].values,
solar_point["longitude"].values,
solar_point.values.ravel()]), dest=0, tag=rank)
solar_point["tsi"].values.ravel()]), dest=0, tag=rank)
else:
solar_grid.loc[:, solar_point["latitude"], solar_point["longitude"]] = solar_point
solar_grid.loc[:, solar_point["latitude"], solar_point["longitude"]] = solar_point["tsi"].values
for sr in range(1, size):
other_point = np.empty(2 + solar_grid.shape[0], dtype=solar_point.dtype)
other_point = np.empty(2 + solar_grid.shape[0], dtype=solar_point["tsi"].dtype)
comm.Recv(other_point, source=sr, tag=sr)
solar_grid.loc[:, other_point[0], other_point[1]] = other_point[2:]

Expand All @@ -68,10 +66,14 @@ def main():
print(solar_grid.max())
if not os.path.exists(args.output):
os.makedirs(args.output)
out_time = pd.Timestamp.utcnow().strftime("%Y-%m-%d_%H%M")
filename = f"solar_radiation_{out_time}.nc"
date_format = "%Y-%m-%d_%H%M"
out_time = pd.Timestamp.utcnow().strftime(date_format)
start_date_str = start_date_ts.strftime(date_format)
end_date_str = end_date_ts.strftime(date_format)
filename = f"solar_irradiance_{start_date_str}_{end_date_str}.nc"
print("Saving")
solar_grid.to_netcdf(os.path.join(args.output, filename), encoding={"tsi": {"zlib": True, "complevel": 4}})
solar_grid.to_netcdf(os.path.join(args.output, filename), encoding={"tsi": {"zlib": True, "complevel": 1,
"shuffle": True}})
return


Expand Down
3 changes: 2 additions & 1 deletion applications/graph_edges.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def main():
lon = coords["longitude"].values
lon[lon > 180] = lon[lon > 180] - 360.0
lat = coords["latitude"].values
resolution = 'onedeg' if abs(lat[1] - lat[0]) > 0.5 else 'quarter'
lon_grid, lat_grid = np.meshgrid(lon, lat)
lon_flat = lon_grid.ravel()
lat_flat = lat_grid.ravel()
Expand All @@ -43,7 +44,7 @@ def main():
if not exists(args.out):
makedirs(args.out)
print("Saving to " + args.out)
output_ds.to_netcdf(join(args.out, f"grid_edge_pairs_{args.dist:0.0f}.nc"))
output_ds.to_netcdf(join(args.out, f"grid_edge_pairs_{args.dist:0.0f}_{resolution}.nc"))
return


Expand Down
56 changes: 56 additions & 0 deletions applications/graph_edges_knn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import xarray as xr
import argparse
import numpy as np
from os.path import join, exists
from os import makedirs
from sklearn.neighbors import BallTree



def main():
parser = argparse.ArgumentParser()
parser.add_argument("-c", "--coord", help="Path to xarray file containing coordinates")
parser.add_argument("-o", "--out", help="Path to output directory")
parser.add_argument("-k", "--k_neigh", type=int, help="Number of neighbors")
# parser.add_argument("-p", "--procs", type=int, help="Number of processes")
args = parser.parse_args()
coords = xr.open_dataset(args.coord)
lon = coords["longitude"].values
lon[lon > 180] = lon[lon > 180] - 360.0
lat = coords["latitude"].values

resolution = 'onedeg' if abs(lat[1] - lat[0]) > 0.5 else 'quarter'

lon_grid, lat_grid = np.meshgrid(lon, lat)
lon_flat = lon_grid.ravel()
lat_flat = lat_grid.ravel()
print("Size:", lon_flat.size)

lat_lon = np.stack([lat_flat, lon_flat], axis=-1)
rad_lat_lon = np.deg2rad(lat_lon)
tree = BallTree(rad_lat_lon, metric='haversine')

distances, indices = tree.query(rad_lat_lon, k=args.k_neigh)

node_indices = np.arange(len(rad_lat_lon)).reshape(-1, 1)
node_indices = np.tile(node_indices, reps=args.k_neigh)

EARTH_RADIUS = 6_371 # in km
dist_arr = distances.reshape(-1) * EARTH_RADIUS
edge_indices_arr = np.stack([indices.reshape(-1), node_indices.reshape(-1)], axis=-1)

output_ds = xr.Dataset({"edges": (("node", "pair"), edge_indices_arr),
"distances": (("node", ), dist_arr),
"longitude": (("index", ), lon_flat),
"latitude": (("index", ), lat_flat),
}, coords={"index": list(range(lon_flat.size))},
attrs=dict(coord_file=args.coord, k_neighbors=args.k_neigh))
if not exists(args.out):
makedirs(args.out)
filename = join(args.out, f"grid_edge_pairs_k_{args.k_neigh}_{resolution}.nc")
output_ds.to_netcdf(filename)
print("Saved to " + filename)


if __name__ == "__main__":
main()
Loading

0 comments on commit 12fc897

Please sign in to comment.