Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace deprecated SciPy cwt with PyWavelets cwt #913

Merged
merged 7 commits into from
Feb 11, 2025
Merged

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Jan 15, 2025

Description

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder self-assigned this Jan 15, 2025
@tomvothecoder tomvothecoder added the bug Bug fix (will increment patch version) label Jan 15, 2025
Comment on lines 416 to 420
widths = deg / (2 * np.pi / period)

widths = deg / (2 * np.pi * freq)
cwtmatr = scipy.signal.cwt(data, scipy.signal.morlet2, widths=widths, w=deg)
psd = np.mean(np.square(np.abs(cwtmatr)), axis=1)
[cfs, freq] = pywt.cwt(data, scales=widths, wavelet="cmor1.5-1.0")
psd = np.mean(np.square(np.abs(cfs)), axis=1)
period = 1 / freq
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes are based on this comment

@tomvothecoder tomvothecoder marked this pull request as ready for review January 15, 2025 22:42
Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang The integration tests still pass. I will do a regression test to compare against v2.12.1 when Perlmutter is back online.

Comment on lines 416 to 421
widths = deg / (2 * np.pi / period)

widths = deg / (2 * np.pi * freq)
cwtmatr = scipy.signal.cwt(data, scipy.signal.morlet2, widths=widths, w=deg)
psd = np.mean(np.square(np.abs(cwtmatr)), axis=1)
[cfs, freq] = pywt.cwt(data, scales=widths, wavelet="cmor1.5-1.0")
psd = np.mean(np.square(np.abs(cfs)), axis=1)
period = 1 / freq

return (period, psd)
Copy link
Collaborator Author

@tomvothecoder tomvothecoder Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @chengzhuzhang, regression testing is complete with the code changes. I'm finding large differences with the QBO wavelet plots, specifically affected by the function return values of (period, psd).

Dev Main Diff
dev main Diff

Let's push this PR back to the E3SM-Unified RC testing period for more debugging and proceed with releasing v3.0.0rc1 (pending #880).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we need some review and fine-tuning. Aiming for next rc is reasonable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder the magnitude change seems consistent with what I found comparing these methods.

Is the period actually different? It looks like it's the same, but the x-axis on the diff plot is hard to read, so I can't figure out what I'm looking at.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our expert is here. I think the period is the same. We just need to use integer as X-axis label to be clear. Should we adjust Y axis range (variance) @whannah1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A larger Y-axis range is probably a good idea

Copy link
Collaborator Author

@tomvothecoder tomvothecoder Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the period actually different? It looks like it's the same, but the x-axis on the diff plot is hard to read, so I can't figure out what I'm looking at.

The wave period is slightly different between the code.

PyWavelets (converted from float64 to int64)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
       33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 42, 43, 44, 45, 46, 47, 48,
       49, 50, 51, 52])

SciPy

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
       52, 53, 54, 55])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh weird, I don't remember seeing that before... 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm in your original function below I noticed you have longest_period defined and period defined twice.

def get_psd_from_wavelet_pywt(data):
   deg = 6
   period = np.arange(1, longest_period + 1)
   widths = deg / ( 2 * np.pi / period )
   [cfs, freq] = pywt.cwt(data, scales=widths, wavelet='cmor1.5-1.0')
   psd = np.mean( np.square( np.abs(cfs) ), axis=1)
   period = 1 / freq
   return (period, psd)

While in the e3sm_diags codebase we have period defined only once with 55 set in place of longest_period.

period = np.arange(1, 55 + 1)

period = np.arange(1, 55 + 1)
widths = deg / (2 * np.pi / period)
widths = period
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to play with pywt.cwt a little bit more, the only way to specify period is to use the values of period as widths/scales. Here is what I get compare to original plots:
with pywt:
image
with scipy:
image

My understand we don't really care about matching variance between two versions of code. @whannah1 can you advice if the code change is reasonable?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why you made that change in how the widths are specified, but to your question about variance, as long as both obs and model are treated consistently then we only need to know the relative different in the peaks. The absolute value of the peak height generally isn't useful for assessing/tuning the QBO. I'm sure one of the plots above is technically more "correct", but I have no idea how to figure which one that is.

Copy link
Contributor

@chengzhuzhang chengzhuzhang Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read about cwt function from scipy and pywt, the fundamental equation are slightly different, also the definition of the input widths/scale. The only way to maintain an evenly spaced x axis (peirod), as in the scipy approach, is to use evenly space scale. Based on what I read, it is pretty controversial as which cwt is more accurate. I think we can use this version for now, as it produces good match with the FFT results on peak periods? @whannah1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I think either version fits our current needs despite the accuracy issue. The issue of even spacing is also not that critical as long as we have sufficient "resolution" of the peak period to generally distinguish a different in period of ~2 months or more. I feel that we have everything we need from the current method. I say we get this merged quickly to address the library deprecation issue and revisit the accuracy of the metric in the future as needed.

Copy link
Contributor

@chengzhuzhang chengzhuzhang Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! We will merge soon. Thank you @whannah1.

@tomvothecoder
Copy link
Collaborator Author

Thank you @chengzhuzhang for the work and @whannah1 for helping!

I will resolve the merge conflicts and merge this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fix (will increment patch version)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: AttributeError: module 'scipy.signal' has no attribute 'cwt' (deprecated, use PyWavelets instead)
3 participants