Skip to content

Explore parallels of nse and R^2 #832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nikeethr opened this issue Feb 28, 2025 · 1 comment
Open

Explore parallels of nse and R^2 #832

nikeethr opened this issue Feb 28, 2025 · 1 comment
Labels
investigation For issues that need extra research before implementation. refactoring
Milestone

Comments

@nikeethr
Copy link
Collaborator

nikeethr commented Feb 28, 2025

$R^2$ has a similar (if not the same) formula as nse. Note that $R^2$'s context is different in that it is popularly (but not always) used for linear regression - which puts in a restriction that it cannot perform poorer than the null hypothesis (mean). Hence why, typically - $0 \leq R^2 \leq 1$ although in reality, the codomain of the two functions are the same... $-\infty \leq R^2 = \text{nse} \leq 1$ (citation required)

From an implementation point of view, there is:

  1. an opportunity to refactor, include $R^2$ and reuse the same logic for nse.

  2. a further opportunity, if we do proceed with 1. to make mse more efficient - e.g. using numba or rust as this function seems to be pivotal in the construction of both metrics.

    (note, use of rust or numba or even opencl for heterogeneous compute is something that I briefly explored with Fractions Skill Score - FSS - and found significant performance gains - but currently in a experimental branch)


Note

Incidentally - and this needs verifying - this is also the reduced form of the square of the pearsons correlation ($\rho^2$), if the model used is a least squares regression (LSR). This is because, the objective of LSR is to minimize the the covariance between $E_i$ and $X_i$, where $X_i$ is the observation and $E_i = Y_i - X_i$, given a the fitted model datum $Y_i$. Doing so would reduce $\rho^2$ to $R^2$

This may have interesting applications when comparing both scores, $R^2$ (or NSE) AND $\rho^2$ against a "naive" optimizer like least squares v.s. a physical model. Where we may be able understand how much explainability a model is able to capture over least squares (or linear) regression, and under what circumstances.

i.e. while $R^2$ tells us how much of the variance in data is explained by a model, comparing it with a LSR fit will tell us how much of the variance that is attributed to "non-linearity" is explained by the model (noting that this is not a guarantee, as the scores can be similar but due to different things being optimized - where one excels at linear regression, and the other excels at non-linear patterns, though I suspect bringing in $\rho$ into the picture may help with that). It is likely that something like this already exists in literature though.

Originally posted by @nikeethr in #815 (comment)

@nikeethr nikeethr changed the title Explore parallels of nse and $R^2$ Explore parallels of nse and R^2 Feb 28, 2025
@nikeethr nikeethr added refactoring investigation For issues that need extra research before implementation. labels Feb 28, 2025
@nikeethr
Copy link
Collaborator Author

nikeethr commented Feb 28, 2025

Note

#773 i.e. spearman's correlation uses pearson's correlation e.g. xr.corr. But this is also used in pearson's in scores - it maybe worth consolidating other areas where these kinds of similarities exist, especially for more basic scoring functions, like mse, xr.corr.


In other words, I think its worth separating the low level computations from the higher level scores (on a case by case basis initially anyway), especially useful if down the line we want to provide alternatives in numba or rust backend.

A similar example can be seen in fractions skill score fss where the "integral sum" is actually computed separately, and in theory can be refactored out and be used on its own for any higher level score that needs to perform multidimensional sliding window sums.

@tennlee tennlee added this to the Wishlist milestone Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigation For issues that need extra research before implementation. refactoring
Projects
None yet
Development

No branches or pull requests

2 participants