Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support calculation of indices (e.g. precipitable water) using gridded data #1533

Open
DanielAdriaansen opened this issue Oct 6, 2020 · 4 comments
Labels
Type: Feature New functionality

Comments

@DanielAdriaansen
Copy link
Contributor

I have an Xarray Dataset object and it has 3D variables of atmospheric data stored in it. Essentially, a collection of nx X ny columns of data. I am attempting to compute precipitable water using this 3D grid, but have been unsuccessful doing it any other way besides looping in nx and ny, storing the result from precipitable_water() in an np ndarray, and then adding it to my Xarray Dataset object. I tried various flavors of xarray apply_ufunc(), but I don't think I am advanced enough of a user to get it to broadcast the precipitable_water() function across my Dataset correctly. Or maybe it's not even possible.

In MetPy, It would be nice to be able to pass these variables in and get back a 2D nx X ny grid of precipitable_water back into my Dataset, or as a separate DataArray object since the dimensions are reduced from 3D to 2D in this calculation.

Something like:
pw_da = mpcalc.precipitable_water(ds['P'],ds['TD'],bottom=pbot,top=ptop)

where pw_da would be a DataArray object since two Dataset objects were passed into the function.

I made the issue generic, since there may be other indices, or calculations that fit this model in MetPy that I haven't thought about where the user would want to distill 3D data into a 2D field of some sort.

After perusing current issues and PR's, I found #1490 , and this may help me but I am not entirely sure.

I originally posted a question on SO awhile back here: https://stackoverflow.com/questions/61255329/pythonic-way-to-compute-precipitable-water, where @dopplershift alluded to some assumed 1D behavior at least in precipitable_water(), so in addition to Xarray-type enhancements there is probably also some 1D to 3D changes needed to achieve this.

@DanielAdriaansen DanielAdriaansen added the Type: Feature New functionality label Oct 6, 2020
@ahuang11
Copy link
Contributor

ahuang11 commented Oct 6, 2020

I know this doesn't isn't reusable for other calculations, but if 'P' is a 1D dimension of 'TD', maybe you could do:
-ds.sel(P=slice(pbot, ptop)).integrate('P') * 1 / rho / g

@dopplershift
Copy link
Member

@DanielAdriaansen Thanks for opening. This is definitely something we want to enable now that we have a fairly large set of useful calculations. Not sure on a schedule for that, though...

@jthielen
Copy link
Collaborator

jthielen commented Oct 6, 2020

Agreed, thanks for opening!

#1490 does not resolve this, as of that PR (and the current state of the master branch), precipitable_water still only works on 1D profiles. The main stopping point is MetPy's get_layer functionality, which does not support gridded data. This is a prime candidate to be updated to use xarray in a future release, which would also allow things like SRH (#1258) and bulk shear to permit gridded data, where the only thing stopping them from being vectorized is get_layer. Not that it is directly in the interest of this issue, but CAPE and other parcel profile-dependent calculations, however, would take a bit more work to support gridded data due to the vertical iteration needed (see #480).

@ahuang11
Copy link
Contributor

ahuang11 commented Oct 7, 2020

I guess another way is to use map blocks (must be chunked with dask) and then perform your loop inside so that it's kind of parallelized.

def xr_precipitable_water(ds):
    da_prws = []
    for lat in ds['lat']:
        for lon in ds['lon']:
            ds_sub = ds.sel(lat=lat, lon=lon)
            ds_sub['prw'] = precipitable_water(
                ds_sub['dpt'].values * units('degC'),
                ds_sub['level'].values * units('hPa')
            ).m
            da_prw = ds_sub['prw']
            da_prws.append(da_prw.expand_dims(['lat', 'lon']))
    da_prw = xr.combine_by_coords(da_prws)
    return da_prw

ds = ds.chunk({'lat': 37, 'lon': 36})
template = ds.isel(level=0, drop=True)[['air']].rename({'air': 'prw'})
ds_prw = xr.map_blocks(xr_precipitable_water, ds, template=template).load()

Executed in 2 mins 6 seconds (tested on my personal machine with 4 cores)

Compared to the absolute serial case

da_prws = []
for lat in ds['lat']:
    for lon in ds['lon']:
        ds_sub = ds.sel(lat=lat, lon=lon)
        ds_sub['prw'] = precipitable_water(
            ds_sub['dpt'].values * units('degC'),
            ds_sub['level'].values * units('hPa')
        ).m
        da_prw = ds_sub['prw']
        da_prws.append(da_prw.expand_dims(['lat', 'lon']))
da_prw = xr.combine_by_coords(da_prws)

Executed in 2 mins 35 seconds

Entire script:

from metpy.calc import precipitable_water, dewpoint_from_relative_humidity
from metpy.units import units
import xarray as xr

ds = xr.open_mfdataset('*.mon.mean.nc').sortby('lat')

ds['dpt'] = (
    ('level', 'lat', 'lon'),
    dewpoint_from_relative_humidity(
        ds['air'].values * units('degC'), ds['rhum'])
)

def xr_precipitable_water(ds):
    da_prws = []
    for lat in ds['lat']:
        for lon in ds['lon']:
            ds_sub = ds.sel(lat=lat, lon=lon)
            ds_sub['prw'] = precipitable_water(
                ds_sub['dpt'].values * units('degC'),
                ds_sub['level'].values * units('hPa')
            ).m
            da_prw = ds_sub['prw']
            da_prws.append(da_prw.expand_dims(['lat', 'lon']))
    da_prw = xr.combine_by_coords(da_prws)
    return da_prw

ds = ds.chunk({'lat': 37, 'lon': 36})
template = ds.isel(level=0, drop=True)[['air']].rename({'air': 'prw'})
ds_prw = xr.map_blocks(xr_precipitable_water, ds, template=template).load()

da_prws = []
for lat in ds['lat']:
    for lon in ds['lon']:
        ds_sub = ds.sel(lat=lat, lon=lon)
        ds_sub['prw'] = precipitable_water(
            ds_sub['dpt'].values * units('degC'),
            ds_sub['level'].values * units('hPa')
        ).m
        da_prw = ds_sub['prw']
        da_prws.append(da_prw.expand_dims(['lat', 'lon']))
da_prw = xr.combine_by_coords(da_prws)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New functionality
Projects
None yet
Development

No branches or pull requests

4 participants