Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid implicit .compute() calls on dask arrays #1615

Open
rpmanser opened this issue Dec 18, 2020 · 3 comments
Open

Avoid implicit .compute() calls on dask arrays #1615

rpmanser opened this issue Dec 18, 2020 · 3 comments
Labels
Type: Feature New functionality

Comments

@rpmanser
Copy link
Contributor

dask implicitly calls .compute() on an array used in a conditional statement. This is not desirable when chaining together metpy functions with dask arrays as inputs. An example of this scenario is in dewpoint_from_relative_humidity:

def dewpoint_from_relative_humidity(temperature, relative_humidity):

    if np.any(relative_humidity > 1.2):
        warnings.warn('Relative humidity >120%, ensure proper units.')
    return dewpoint(relative_humidity * saturation_vapor_pressure(temperature))

In this case I think an optional argument like check_values with a default of True would work fine. Something like:

def dewpoint_from_relative_humidity(temperature, relative_humidity, check_values=True):

    if check_values and np.any(relative_humidity > 1.2):
        warnings.warn('Relative humidity >120%, ensure proper units.')
    return dewpoint(relative_humidity * saturation_vapor_pressure(temperature))

Alternatives for similar statements in the code base will need to considered. This should be the first major step toward completing #1479.

Let's document functions from the calc module here that need modification to avoid implicit .compute() calls and discuss potential solutions.

Functions

  • dewpoint_from_relative_humidity

Solutions

  • Add optional arguments
@rpmanser rpmanser added the Type: Feature New functionality label Dec 18, 2020
@rpmanser
Copy link
Contributor Author

rpmanser commented Sep 5, 2021

I don't think it's possible to avoid implicit .compute() calls without significantly altering the codebase to specifically alleviate this problem. I'm not sure there would be a noticeable improvement in performance either. I'm going to close this, but if someone thinks of a good solution it can be reopened and revisited.

@rpmanser rpmanser closed this as completed Sep 5, 2021
@rpmanser
Copy link
Contributor Author

Reopening this issue to explore some solutions discussed with @dcamron. First thing we can try is calling np.max() on relative_humidity, then comparing that scalar to the warning threshold to hopefully retain lazy evaluation. This brings up the question of whether or not it's okay for the warning itself to be lazy, i.e., the warning would not appear here:

dewpoint = dewpoint_from_relative_humidity(temperature, relative_humidity)

but instead here:

dewpoint.compute().

I think this is okay, as dask users should understand that anything in their task graph can throw warnings and errors and that they are responsible for tracing where the warning or error occurred in their workflow.

@rpmanser rpmanser reopened this Jul 15, 2023
@dopplershift
Copy link
Member

I think the delayed warning isn't ideal but, as you pointed out, is par for the course with Dask, so that's fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New functionality
Projects
None yet
Development

No branches or pull requests

2 participants