[sdba] Optimizations and fixes #791

aulemahal · 2021-08-10T21:15:11Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes Usage question: How to use multiple members as samples in sdba? #810
Tests for the changes have been added (for bug fixes / features)
Documentation has been added / updated (for bug fixes / features)
HISTORY.rst has been updated (with summary of main changes)
bumpversion (major / minor / patch) has been called on this branch
Tags have been pushed (git push --tags)
The relevant author information has been added to .zenodo.json

What kind of change does this PR introduce?

Optimizes map_blocks when using coords backed by cftime.datetime objects. Dask needs some sort of hash for each input and that tokenization is quite slow for cftime np.ndarrays because it goes through pickle.dumps (see Performance : speeding up pickling of cftime arrays Unidata/cftime#253). Also, CFTimeIndex are slow to create because of a check that cycles through all elements of the array. With many chunks and large time dimensions, this takes a significant portion of the call time (excluding the actual computation). Here we add an option to encode cftime coordinate into integers (+ attributes), like in a netCDF, saving time on the cftime handling. The decoding is done within the task itself (thus in parallel).
Optimizes adapt_freq to skip computation on all-nan blocks.
Fixes nbu.vecquantile when rnk is NaN.
Merges all step of DQM.adjust in a single map-block-wrapped function. Once again to reduce the number of tasks in dask's tree. However, the adjust tasks as large and will trigger warnings about garbage collection taking a lot of cpu time. I haven't found any optimization for this part yet.
Better dtype conservation. Fixes at many places, including in map_blocks to set the template variables to the largest dtype of the inputs.
Modify map_blocks and map_groups to consider dimensions in Grouper.add_dims. By default, functions wrapped with map_groups will assume Grouper.DIM AND Grouper.ADD_DIMS are all reduced, unless main_only=True is passed.
"constant" Extrapolation of adjustment factors in qm_adjust is now using values just over and under the max and min of the target data, instead of ±np.inf. There was a bug I had while adjusting precipitation in ESPO-R that was solved by this. I'm not completly sure why...

Does this PR introduce a breaking change?

I think not, but this PR was so iterative over a long time, I'm not 100% sure.

Other information:

All those optimizations and fixes were done while processing "scénarios génériques v2". Except the add_dims thing.

…ip-all-nan

…into sdba-opt-and-fixes-sg2

review-notebook-app · 2021-08-26T17:43:15Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

xclim/sdba/_adjustment.py

xclim/sdba/base.py

Co-authored-by: David Huard <huard.david@ouranos.ca>

aulemahal added 12 commits August 10, 2021 12:10

Optimize cftime map_blocks and add context for map_blocks fails

ecc2257

Merge remote-tracking branch 'origin/sdba-cf-opt' into sdba-adfreq-sk…

4662cb8

…ip-all-nan

merge sdba-cf-opt

a47e0fb

Merge branch 'master' into sdba-opt-and-fixes-sg2

def5fe3

force dtype and decode coords

02fcf78

Merge branch 'sdba-opt-and-fixes-sg2' of github.com:Ouranosinc/xclim …

87535af

…into sdba-opt-and-fixes-sg2

Merge all dqm_adjust in single block func

d1a19e4

Add unit handling!

cdd3458

merge master

5c6612e

dtype adjustments

2bb302e

remove debug print

98dfc7d

Add unit handling to processing func

b7b7bb8

aulemahal mentioned this pull request Aug 19, 2021

Update history, refactor sdba processing functions, add unit handling in sdba #801

Merged

7 tasks

aulemahal and others added 2 commits August 20, 2021 13:01

Use maximum instead of inf in extrapolation

fb7b65f

Merge branch 'master' into sdba-opt-and-fixes-sg2

b9d299c

aulemahal mentioned this pull request Aug 25, 2021

Usage question: How to use multiple members as samples in sdba? #810

Closed

aulemahal added 3 commits August 26, 2021 11:28

merge master

5f0a9f1

Fix add_dims back in map_blocks

223a8a0

remove assertion test in sdba - too random

6af3a3b

aulemahal marked this pull request as ready for review August 26, 2021 17:43

aulemahal requested a review from huard August 26, 2021 17:47

aulemahal added this to the v0.29 milestone Aug 26, 2021

aulemahal added 2 commits August 26, 2021 13:50

upd hist

1c09789

upd hist 2

88c2765

huard approved these changes Aug 26, 2021

View reviewed changes

xclim/sdba/_adjustment.py Outdated Show resolved Hide resolved

xclim/sdba/base.py Outdated Show resolved Hide resolved

aulemahal and others added 3 commits August 26, 2021 14:38

Apply suggestions from code review

8ebea5e

Co-authored-by: David Huard <huard.david@ouranos.ca>

fix scaling test with add_dims

f24a6ce

bumped the version

dfaafdc

aulemahal merged commit a6c82be into master Aug 26, 2021

aulemahal deleted the sdba-opt-and-fixes-sg2 branch August 26, 2021 19:00

This was referenced Aug 27, 2021

[sdba] bug with LOCI map_blocks for add_dims in Grouper #818

Closed

Fix 818 #819

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sdba] Optimizations and fixes #791

[sdba] Optimizations and fixes #791

aulemahal commented Aug 10, 2021 •

edited

Loading

review-notebook-app bot commented Aug 26, 2021

[sdba] Optimizations and fixes #791

[sdba] Optimizations and fixes #791

Conversation

aulemahal commented Aug 10, 2021 • edited Loading

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

review-notebook-app bot commented Aug 26, 2021

aulemahal commented Aug 10, 2021 •

edited

Loading