Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask delayed operations and Pint Quantity #996

Open
huard opened this issue Jan 21, 2019 · 8 comments
Open

Dask delayed operations and Pint Quantity #996

huard opened this issue Jan 21, 2019 · 8 comments
Assignees
Labels
Area: Calc Pertains to calculations Type: Enhancement Enhancement to existing functionality

Comments

@huard
Copy link

huard commented Jan 21, 2019

I haven't found from the documentation whether MetPy supports delayed operations with dask. The code for unit conversion seems to access _data_array.values, which suggests that the entire array is loaded in memory. We have multi Gb files that require unit conversion and ideally the converted DataArray would be lazily evaluated.

@dopplershift
Copy link
Member

I haven't taken a look with dask yet, but your initial analysis seems about right unfortunately.

This might be out of our control due to pint, but is definitely something on my todo list to take a look at. I can see it also being another reason to adjust how we handle the unit problem internally.

@dopplershift dopplershift added Type: Enhancement Enhancement to existing functionality Area: Calc Pertains to calculations labels Jan 22, 2019
@tjwixtrom
Copy link
Contributor

Upon attempting this with RH calculations over a large dataset in Xarray (with Dask enabled) I can confirm that calling MetPy does load the arrays memory. Dask still allows for parallel computations on chunks which does keep from runaway RAM usage and performance is acceptable, but the lazy evaluation stops at the point of calling metpy.calc. The temporary workaround would be to do all subsetting operations before metpy computations.

@huard
Copy link
Author

huard commented Jul 22, 2019

We've gone around this issue by calling units.convert instead of using the to method.

@dopplershift
Copy link
Member

So you're saying .to() forces breaks the parallelism but units.convert() doesn't?

@huard
Copy link
Author

huard commented Jul 22, 2019

It's probably not that straightforward and it's been a while, but I think using to, we had to copy the input array, then use the output of to and change the values in place. I don't quite remember why we needed this copy, but this was the culprit, not the conversion itself (see https://github.com/Ouranosinc/xclim/pull/156/files).

@jthielen
Copy link
Collaborator

Just an update from upstream: Pint v0.10 (to be released in the next week or so) will have preliminary support for wrapping Dask arrays. However, dask/dask#4583 is holding up full compatibility and the ability to put together a robust set of integration tests, so there will likely be issues remaining (such as non-commutativity and Dask mistakenly wrapping Pint).

So, from MetPy's point-of-view, I think it would be good to start some early experiments with Dask support in calculations, but it won't be ready for v1.0?

@dopplershift
Copy link
Member

That seems about right. I think overall full support will be something we look at beyond the GEMPAK work.

@jthielen
Copy link
Collaborator

Leaving a note here for future Dask compatibility work: the window smoother added in #1223 explicitly casts to ndarray, which prevents Dask compatibility for that smoother (and dependent smoothers like circular, rectangular, and n-point) (see #1223 (comment)).

@jthielen jthielen added this to the 1.1 milestone Apr 17, 2020
@jthielen jthielen self-assigned this Apr 17, 2020
@dopplershift dopplershift modified the milestones: 1.1.0, 1.2.0 Aug 2, 2021
@dopplershift dopplershift modified the milestones: 1.2.0, 1.3.0 Oct 8, 2021
@dopplershift dopplershift modified the milestones: 1.3.0, May 2022 Mar 31, 2022
@dopplershift dopplershift modified the milestones: May 2022, July 2022 May 16, 2022
@dopplershift dopplershift removed this from the September 2022 milestone Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Calc Pertains to calculations Type: Enhancement Enhancement to existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants