-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask delayed operations and Pint Quantity #996
Comments
I haven't taken a look with dask yet, but your initial analysis seems about right unfortunately. This might be out of our control due to pint, but is definitely something on my todo list to take a look at. I can see it also being another reason to adjust how we handle the unit problem internally. |
Upon attempting this with RH calculations over a large dataset in Xarray (with Dask enabled) I can confirm that calling MetPy does load the arrays memory. Dask still allows for parallel computations on chunks which does keep from runaway RAM usage and performance is acceptable, but the lazy evaluation stops at the point of calling metpy.calc. The temporary workaround would be to do all subsetting operations before metpy computations. |
We've gone around this issue by calling |
So you're saying |
It's probably not that straightforward and it's been a while, but I think using |
Just an update from upstream: Pint v0.10 (to be released in the next week or so) will have preliminary support for wrapping Dask arrays. However, dask/dask#4583 is holding up full compatibility and the ability to put together a robust set of integration tests, so there will likely be issues remaining (such as non-commutativity and Dask mistakenly wrapping Pint). So, from MetPy's point-of-view, I think it would be good to start some early experiments with Dask support in calculations, but it won't be ready for v1.0? |
That seems about right. I think overall full support will be something we look at beyond the GEMPAK work. |
Leaving a note here for future Dask compatibility work: the window smoother added in #1223 explicitly casts to ndarray, which prevents Dask compatibility for that smoother (and dependent smoothers like circular, rectangular, and n-point) (see #1223 (comment)). |
I haven't found from the documentation whether MetPy supports delayed operations with dask. The code for unit conversion seems to access
_data_array.values
, which suggests that the entire array is loaded in memory. We have multi Gb files that require unit conversion and ideally the convertedDataArray
would be lazily evaluated.The text was updated successfully, but these errors were encountered: