-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Debug zppy diffs for v3.0.0 #931
base: main
Are you sure you want to change the base?
Conversation
@@ -743,7 +743,7 @@ def _add_rmse_corr_text( | |||
fontdict = {"fontsize": fontsize} | |||
|
|||
if left_text_pos is None: | |||
left_text_pos = (0.6335, -0.0105) | |||
left_text_pos = (0.6635, -0.0105) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligns with
panel[n][0] + 0.6635, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this will fix other 3 panel plots that uses the lat_lon_plot.py? because I do see same problems in other sets that has the string RMSE
CORR
mis-placed, some are over lapping with figures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I checked, lat_lon
, zonal_mean_2d/stratosphere
, meridional_mean_2d
, polar
will all be fixed.
@chengzhuzhang For the Root causeTo determine the lower resolution for regridding, DebuggingTest var has a shape of (120, 360) while ref var has a shape of (120, 288).
Comparing:
* /lcrc/group/e3sm/public_html/cdat-migration-fy24/25-02-04-branch-930-zppy-diffs/lat_lon/OMI-MLS/OMI-MLS-TCO-ANN-60S60N_diff.nc
* /lcrc/group/e3sm/public_html/cdat-migration-fy24/25-02-04-main-zppy-diffs/lat_lon/OMI-MLS/OMI-MLS-TCO-ANN-60S60N_diff.nc
* var_key: TCO
Not equal to tolerance rtol=0.0001, atol=0
(shapes (120, 360), (120, 288) mismatch)
ACTUAL: array([[4.909916, 5.552422, 6.897028, ..., 7.011181, 8.307238, 6.755878],
[7.468704, 7.0319 , 7.010069, ..., 4.298189, 5.202745, 6.788332],
[5.496756, 6.867319, 7.179531, ..., 5.23127 , 5.508911, 6.141758],...
DESIRED: array([[2.213444, 1.642891, 2.503925, ..., 5.836899, 5.549273, 6.050253],
[1.589104, 2.051823, 1.479935, ..., 5.849903, 5.876003, 5.855825],
[1.357033, 1.710564, 1.29594 , ..., 5.891077, 6.22814 , 5.934219],... |
@tomvothecoder Thank you for the nice work so far. I walked through other sets and noticing problems that we should investigate and try to fix. viewer/: for taylor diagram: results of TREFHT land is off, this should related to different masking. Cosmetic. |
I checked the diff figures, despite the resolution difference, the metrics and results are very close. I think we can address #739 at a later time. |
Description
Summary of Changes
zppy
diffs for v3.0.0 #930TLDR: Conclusion
I'm confident that v3.0.0 is working correctly since the test (model) and reference (observation) files align. The differences in the "Model - Observation" plots are expected because:
diff.nc
.diff.nc
is computed as test minus reference, any small discrepancy can produce large differences, which then appear in the plot.Debugging Overview
Some "Model - Observation" subplots show large differences between E3SM Diags v2.12.1 and v3.0.0rc2. These plots use the
diff.nc
file (test - ref). We need to determine why the plot differences are so large in some cases.1. First, let's compare the output files for test (
test.nc
) and reference (ref.nc
) to see if these align betweene3sm_diags
versions.Source:
regression_nc.ipynb
. With relative tolerance of 1e-5 (rol=1e-5
):ERA5-TREFHT-land
,MERRA2-TAUXY-land
, andMERRA2-TREFHT-land
lat_lon
set with run script and debug any issues #794Great news! The test and reference files are within tolerance, so the first two of three plots show no image diffs. Now, we need to analyze
diffs.nc
for differences in the third plot, "Model-Observation."I asked ChatGPT two questions:
Can floating point comparison between the differences of two floating point arrays result in large differences?
I have a plot called "Model - Observation" which plots the difference of two floating point arrays (model and observation). If I compare the difference in the plots between two branches (main vs. development), should I expect large difference due to shifting nan positions from regridding even if nan counts are the same?
These explanations reinforce that differences are expected in some "Model-Observation" plots.
Comparing
diff.nc
between codebases is unreliable, but let's still group image diffs by suspected or known causes.1. Expected due to bug found on old CDAT codebase (here)
Affected variables:
2. Expected due to RMSE/CORR positioning bug (fixed in this PR)
Affected variables:
3. Expected due to
nan
location mismatches, caused by regridding differences between xCDAT/xESMF and CDAT/ESMF when downscaling variables.Affected variables:
Expected due to land-sea mask differences between xCDAT/xESMF and CDAT/ESMF. Details](
def _apply_land_sea_mask(
ds: xr.Dataset,
ds_mask: xr.Dataset,
var_key: str,
region: Literal["land", "ocean"],
regrid_tool: str,
regrid_method: str,
) -> xr.Dataset:
"""Apply a land or sea mask based on the region ("land" or "ocean").
Parameters
----------
ds: xr.Dataset
The dataset containing the variable.
ds_mask : xr.Dataset
The dataset containing the land sea region mask variable(s).
var_key : str
The key the variable
region : Literal["land", "ocean"]
The region to mask.
regrid_tool : {"esmf", "xesmf", "regrid2"}
The regridding tool to use. Note, "esmf" is accepted for backwards
compatibility with e3sm_diags and is simply updated to "xesmf".
regrid_method : str
The regridding method to use. Refer to [1]_ for more information on
these options.
esmf/xesmf options:
- "bilinear"
- "conservative"
- "conservative_normed" -- equivalent to "conservative" in cdms2 ESMF
- "patch"
- "nearest_s2d"
- "nearest_d2s"
regrid2 options:
- "conservative"
Returns
-------
xr.Dataset
The Dataset with the land or sea mask applied to the variable.
"""
# TODO: Remove this conditional once "esmf" references are updated to
# "xesmf" throughout the codebase.
if regrid_tool == "esmf":
regrid_tool = "xesmf"
# TODO: Remove this conditional once "conservative" references are updated
# to "conservative_normed" throughout the codebase.
# NOTE: this is equivalent to "conservative" in cdms2 ESMF. If
# "conservative" is chosen, it is updated to "conservative_normed". This
# logic can be removed once the CoreParameter.regrid_method default
# value is updated to "conservative_normed" and all sets have been
# refactored to use this function.
if regrid_method == "conservative":
regrid_method = "conservative_normed"
# A dictionary storing the specifications for this region.
specs = REGION_SPECS[region]
# If the region is land or ocean, regrid the land sea mask to the same
# shape (lat x lon) as the variable then apply the mask to the variable.
# Land and ocean masks have a region value which is used as the upper limit
# for masking.
ds_new = ds.copy()
ds_new = _drop_unused_ilev_axis(ds)
output_grid = ds_new.regridder.grid
mask_var_key = _get_region_mask_var_key(ds_mask, region)
ds_mask_new = _drop_unused_ilev_axis(ds_mask)
ds_mask_regrid = ds_mask_new.regridder.horizontal(
mask_var_key,
output_grid,
tool=regrid_tool,
method=regrid_method,
)
# Update the mask variable with a lower limit. All values below the
# lower limit will be masked.
land_sea_mask = ds_mask_regrid[mask_var_key]
lower_limit = specs["value"] # type: ignore
cond = land_sea_mask > lower_limit
# Apply the mask with a condition (`cond`) using `.where()`. Note, the
# condition matches values to keep, not values to mask out, `drop` is
# set to False because we want to preserve the masked values (`np.nan`)
# for plotting purposes.
masked_var = ds_new[var_key].where(cond=cond, drop=False)
ds_new[var_key] = masked_var
return ds_new
).
e3sm_diags/e3sm_diags/driver/utils/regrid.py
Lines 177 to 268 in 9b92f8f
Affected variables:
Expected due to differences between xCDAT/xESMF vs. CDAT/ESMF when downscaling variables. These affected variables have large diffs, so I did further investigation.
Affected variables:
Further debugging to validate expected for affected variables:
Results: All pass
diff.nc
files used for "Model - Observation" plotResult: Large differences
diff.nc
filesResults -- All close enough to be comfortable (I think)
Step-by-step for Debugging
.cfg
that isolates the variables belowrun_script
using v2.12.1 and v3.0.0rc2.nc
differences fortest.nc
andref.nc
files (regression_nc.ipynb
).nc
differences fordiff.nc
files (regression_nc_diffs.ipynb
)Checklist
If applicable: