Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian SINDy #440

Merged
merged 60 commits into from
Feb 20, 2024
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
79a4fa0
Added Sparse Bayesian Regression (SBR) optimizer
mikkelbue Dec 13, 2023
5c5ee4e
Added Bayesian SINDy notebook
mikkelbue Dec 13, 2023
acab720
Blacked Sparse Bayesian Regression code
mikkelbue Dec 13, 2023
66112bb
Renamed BayesianSparseRegression to SparseBayesianRegression
mikkelbue Dec 13, 2023
cc89101
Added numpyro as an optional dependency
mikkelbue Dec 14, 2023
22fac95
Removed reference to PEP 690 from SBR import
mikkelbue Dec 14, 2023
a681ae8
Renamed SBR hyperparameters to something more interpretable
mikkelbue Dec 14, 2023
8da636a
Renamed Bayesian SINDy notebook
mikkelbue Dec 14, 2023
f209b54
Moved methods from SparseBayesianRegression into SBR optimizer
mikkelbue Dec 14, 2023
5b60afe
Vectorized sampling of SINDy coefficients in numpyro for optimization
mikkelbue Dec 21, 2023
b827dab
Added docstring to Sparse Bayesian Regression optimizer
mikkelbue Dec 21, 2023
f56e9e1
Added some documentation and testing mode to the Bayesian SINDy notebook
mikkelbue Jan 8, 2024
5ab7932
Added jax as optional dependency for Bayesian SINDy
mikkelbue Jan 9, 2024
83c6a17
Added numpyro dependecy group to github workflow
mikkelbue Jan 10, 2024
8c1a923
Removed misplaced space in github workflow
mikkelbue Jan 10, 2024
66c0f15
Fixed bug in SBR docstring.
mikkelbue Jan 11, 2024
754722a
Removed reference in docstring.
mikkelbue Jan 11, 2024
1f17975
Update pyproject.toml
mikkelbue Jan 22, 2024
b391898
Update pysindy/optimizers/sbr.py
mikkelbue Jan 22, 2024
cc30881
Update pysindy/optimizers/sbr.py
mikkelbue Jan 22, 2024
f0a6823
Rename numpyro dependency group to sbr
mikkelbue Jan 22, 2024
80fc151
Remove reference to numpyro implementation and add horseshoe descript…
mikkelbue Jan 22, 2024
cacb02b
Reorganize SBR kwargs for MCMC and super
mikkelbue Jan 22, 2024
12e5677
Reran SBR notebooks after updating the kwargs logic
mikkelbue Jan 22, 2024
7af3848
Moved horseshoe hyperparameter sampling into _numpyro_model routine
mikkelbue Jan 22, 2024
01698be
Updated documenation for SBR after refactording the kwargs
mikkelbue Jan 22, 2024
952b989
RemovedJAX key splitting since it's not necessary for SBR
mikkelbue Jan 23, 2024
ae9325b
Renamed mcmc to mcmc_ in SBR oprimizer
mikkelbue Jan 23, 2024
890ad58
Reran SBR notebooks after recent updates
mikkelbue Jan 23, 2024
6e9b003
Added type hints to SBR
mikkelbue Jan 23, 2024
28dd0f3
Trip initilialisation of SBR of unbias=True, since it's incompatible
mikkelbue Jan 23, 2024
8df9665
Added discrete time example
mikkelbue Jan 25, 2024
52b9070
Changed discrete time example to harmonic oscillator
mikkelbue Jan 29, 2024
ab2641b
Add note about the SBR modelling assumptions in the docstring
mikkelbue Jan 30, 2024
0b25e65
Added some more clarification to SBR docstring
mikkelbue Feb 5, 2024
57f4178
Added SBR optimizer to a bunch of tests
mikkelbue Feb 5, 2024
98fa0fd
Merge branch 'master' into bayesian_sindy
mikkelbue Feb 5, 2024
17f4d65
Removed strict versioning for numpyro
mikkelbue Feb 5, 2024
85a5db1
Remove reference marker from mcmc in SBR docstring
mikkelbue Feb 6, 2024
1ec2523
Added some bad argument calue checking to SBR
mikkelbue Feb 6, 2024
73c2fd5
Added tests got bad arguments to SBR optimizer
mikkelbue Feb 6, 2024
5ff20a1
Black test_optimizers.py
mikkelbue Feb 6, 2024
8c0c019
Added simple test of SBR fitting
mikkelbue Feb 6, 2024
4fb02a0
Black test_optimizers.py
mikkelbue Feb 6, 2024
9ef48d7
Added pickle test for SBR optimizer
mikkelbue Feb 6, 2024
d8b93b1
Updated test_sbr_fit so Flake8 isn't tripped
mikkelbue Feb 6, 2024
eced555
Rephrase docstring of SBR for clarity.
mikkelbue Feb 12, 2024
ad38749
Rephrase description of "slab_shape_nu" in SBR for clarity
mikkelbue Feb 12, 2024
cd4ecee
Rephrase desciption of "slab_shape_s" in SBR for clarity
mikkelbue Feb 12, 2024
54422da
Add docstring and type hints to "_sample_reg_horseshoe"
mikkelbue Feb 12, 2024
ed358cb
Remove superfluous function definition from "_sample_reg_horseshoe"
mikkelbue Feb 12, 2024
b2579bb
Fixed bug in type hints for "_sample_reg_horseshoe"
mikkelbue Feb 12, 2024
a13ce82
Changed SBR test to also test for coefficient values
mikkelbue Feb 12, 2024
6426f6c
Removed discrete example from Bayesian SINDy
mikkelbue Feb 13, 2024
f20fd1e
Added comment about the limitations of the current SBR method to exam…
mikkelbue Feb 13, 2024
3c37bf0
Added SBR example script
mikkelbue Feb 13, 2024
6bf64e3
Ran publish_notebook.py on SBR example
mikkelbue Feb 13, 2024
9aeadea
Tidied up and reran SBR notebook example after publishing it.
mikkelbue Feb 13, 2024
b855006
Added arviz to SBR dependencies
mikkelbue Feb 13, 2024
2060875
Added high-level import of SBR
mikkelbue Feb 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
- name: Install dependencies
run: |
pip install --upgrade pip
pip install .[dev,miosr,cvxpy]
pip install .[dev,miosr,cvxpy,sbr]
- name: Build the docs
# Not exactly how RTD does it, but close.
run: |
Expand Down
764 changes: 764 additions & 0 deletions examples/19_bayesian_sindy/example-discrete.ipynb

Large diffs are not rendered by default.

799 changes: 799 additions & 0 deletions examples/19_bayesian_sindy/example.ipynb

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ cvxpy = [
"cvxpy",
"scs>=2.1, !=2.1.4"
]
sbr = [
"numpyro",
"jax"
]

[tool.black]
line-length = 88
Expand Down
5 changes: 5 additions & 0 deletions pysindy/optimizers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
from .stable_linear_sr3 import StableLinearSR3
except ImportError:
pass
try:
from .sbr import SBR
except ImportError:
pass

Check warning on line 25 in pysindy/optimizers/__init__.py

View check run for this annotation

Codecov / codecov/patch

pysindy/optimizers/__init__.py#L24-L25

Added lines #L24 - L25 were not covered by tests
from .wrapped_optimizer import WrappedOptimizer
from .sr3 import SR3
from .ssr import SSR
Expand All @@ -37,4 +41,5 @@
"FROLS",
"SINDyPI",
"MIOSR",
"SBR",
]
189 changes: 189 additions & 0 deletions pysindy/optimizers/sbr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
from typing import Optional

import jax.numpy as jnp
import numpy as np
import numpyro
from jax import random
from numpyro.distributions import Exponential
from numpyro.distributions import HalfCauchy
from numpyro.distributions import InverseGamma
from numpyro.distributions import Normal
from numpyro.infer import MCMC
from numpyro.infer import NUTS

from .base import BaseOptimizer


class SBR(BaseOptimizer):
"""
Sparse Bayesian Regression (SBR) optimizer. This uses the regularised
horseshoe prior over the SINDy coefficients to achieve sparsification.

The horseshoe prior contains a "spike" of nonzero probability at the
origin, and a "slab" of distribution in cases where a coefficient is nonzero.
mikkelbue marked this conversation as resolved.
Show resolved Hide resolved

The SINDy coefficients are set as the posterior means of the MCMC NUTS samples.
Additional statistics can be computed from the MCMC samples stored in
the mcmc attribute using e.g. ArviZ.

This implementation differs from the method described in Hirsh et al. (2021)
by imposing the error model directly on the derivatives, rather than on the
states, circumventing the need to integrate the equation to evaluate the
posterior density. One consequence of this is that the noise standard
deviation "sigma" is with respect to the derivatives instead of the states
and hence should not be interpreted.

TODO: Implement the data-generating model described in eq. 2.4 of Hirsh
et al. (2021). This could be achieved using the JAX-based solver "diffrax".
Se discussion in https://github.com/dynamicslab/pysindy/pull/440 for more
details.

See the following reference for more details:

Hirsh, S. M., Barajas-Solano, D. A., & Kutz, J. N. (2021).
Sparsifying Priors for Bayesian Uncertainty Quantification in
Model Discovery (arXiv:2107.02107). arXiv. http://arxiv.org/abs/2107.02107

Parameters
----------
sparsity_coef_tau0 : float, optional (default 0.1)
Sparsity coefficient for regularised horseshoe hyper-prior. Lower
value increases the sparsity of the SINDy coefficients.

slab_shape_nu : float, optional (default 4)
Controls spread of slab. For values less than 4,
the kurtosis of of nonzero coefficients is undefined. As the value
increases past 4, for higher values, the variance and kurtosis approach
:math:`s` and :math:`s^2`, respectively
mikkelbue marked this conversation as resolved.
Show resolved Hide resolved

slab_shape_s : float, optional (default 2)
Controls spread of slab. Higher values lead to more spread
out nonzero coefficients.
mikkelbue marked this conversation as resolved.
Show resolved Hide resolved

noise_hyper_lambda : float, optional (default 1)
Rate hyperparameter for the exponential prior distribution over
the noise standard deviation.

num_warmup : int, optional (default 1000)
Number of warmup (or "burnin") MCMC samples to generate. These are
discarded before analysis and are not included in the posterior samples.

num_samples : int, optional (default 5000)
Number of posterior MCMC samples to generate.

mcmc_kwargs : dict, optional (default None)
Instructions for MCMC sampling.
Keyword arguments are passed to numpyro.infer.MCMC

Attributes
----------
coef_ : array, shape (n_features,) or (n_targets, n_features)
Posterior means of the SINDy coefficients.

mcmc_ : numpyro.infer.MCMC
Complete traces of the posterior samples.
"""

def __init__(
self,
sparsity_coef_tau0: float = 0.1,
slab_shape_nu: float = 4,
slab_shape_s: float = 2,
noise_hyper_lambda: float = 1,
num_warmup: int = 1000,
num_samples: int = 5000,
mcmc_kwargs: Optional[dict] = None,
mikkelbue marked this conversation as resolved.
Show resolved Hide resolved
unbias: bool = False,
**kwargs,
):

if unbias:
raise ValueError("SBR is incompatible with unbiasing. Set unbias=False")

super().__init__(unbias=unbias, **kwargs)

# check that hyperparameters are positive.
if sparsity_coef_tau0 <= 0:
raise ValueError("sparsity_coef_tau0 must be positive")
if slab_shape_nu <= 0:
raise ValueError("slab_shape_nu must be positive")
if slab_shape_s <= 0:
raise ValueError("slab_shape_s must be positive")
if noise_hyper_lambda <= 0:
raise ValueError("noise_hyper_lambda must be positive")

# check that samples are positive integers.
if not isinstance(num_warmup, int) or num_warmup < 0:
raise ValueError("num_warmup must be a positive integer")
if not isinstance(num_samples, int) or num_samples < 0:
raise ValueError("num_samples must be a positive integer")

# set the hyperparameters
self.sparsity_coef_tau0 = sparsity_coef_tau0
self.slab_shape_nu = slab_shape_nu
self.slab_shape_s = slab_shape_s
self.noise_hyper_lambda = noise_hyper_lambda

# set MCMC sampling parameters.
self.num_warmup = num_warmup
self.num_samples = num_samples

# set the MCMC kwargs.
if mcmc_kwargs is not None:
self.mcmc_kwargs = mcmc_kwargs
else:
self.mcmc_kwargs = {}

def _reduce(self, x, y):
mikkelbue marked this conversation as resolved.
Show resolved Hide resolved
# set up a sparse regression and sample.
self.mcmc_ = self._run_mcmc(x, y, **self.mcmc_kwargs)

# set the mean values as the coefficients.
self.coef_ = np.array(self.mcmc_.get_samples()["beta"].mean(axis=0))

def _numpyro_model(self, x, y):
# get the dimensionality of the problem.
n_features = x.shape[1]
n_targets = y.shape[1]

# sample the horseshoe hyperparameters.
tau = numpyro.sample("tau", HalfCauchy(self.sparsity_coef_tau0))
c_sq = numpyro.sample(
"c_sq",
InverseGamma(
self.slab_shape_nu / 2, self.slab_shape_nu / 2 * self.slab_shape_s**2
),
)

# sample the parameters compute the predicted outputs.
beta = _sample_reg_horseshoe(tau, c_sq, (n_targets, n_features))
mu = jnp.dot(x, beta.T)

# compute the likelihood.
sigma = numpyro.sample("sigma", Exponential(self.noise_hyper_lambda))
numpyro.sample("obs", Normal(mu, sigma), obs=y)

def _run_mcmc(self, x, y, **kwargs):
# set up a jax random key.
seed = kwargs.pop("seed", 0)
rng_key = random.PRNGKey(seed)

# run the MCMC
kernel = NUTS(self._numpyro_model)
mcmc = MCMC(
kernel, num_warmup=self.num_warmup, num_samples=self.num_samples, **kwargs
)
mcmc.run(rng_key, x=x, y=y)

# extract the MCMC samples and compute the UQ-SINDy parameters.
return mcmc


def _sample_reg_horseshoe(tau, c_sq, shape):
mikkelbue marked this conversation as resolved.
Show resolved Hide resolved
lamb = numpyro.sample("lambda", HalfCauchy(1.0), sample_shape=shape)
lamb_squiggle = jnp.sqrt(c_sq) * lamb / jnp.sqrt(c_sq + tau**2 * lamb**2)
beta = numpyro.sample(
"beta",
Normal(jnp.zeros_like(lamb_squiggle), jnp.sqrt(lamb_squiggle**2 * tau**2)),
)
return beta
53 changes: 49 additions & 4 deletions test/test_optimizers.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from pysindy.optimizers import EnsembleOptimizer
from pysindy.optimizers import FROLS
from pysindy.optimizers import MIOSR
from pysindy.optimizers import SBR
from pysindy.optimizers import SINDyPI
from pysindy.optimizers import SR3
from pysindy.optimizers import SSR
Expand Down Expand Up @@ -79,6 +80,7 @@ def predict(self, x):
(StableLinearSR3, True),
(TrappingSR3, True),
(DummyLinearModel, False),
(SBR, True),
],
)
def test_supports_multiple_targets(cls, support):
Expand All @@ -105,6 +107,7 @@ def data(request):
ElasticNet(fit_intercept=False),
DummyLinearModel(),
MIOSR(),
SBR(),
],
)
def test_fit(data_derivative_1d, optimizer):
Expand All @@ -124,14 +127,14 @@ def test_fit(data_derivative_1d, optimizer):

@pytest.mark.parametrize(
"optimizer",
[STLSQ(), SSR(), SSR(criteria="model_residual"), FROLS(), SR3(), MIOSR()],
[STLSQ(), SSR(), SSR(criteria="model_residual"), FROLS(), SR3(), MIOSR(), SBR()],
)
def test_not_fitted(optimizer):
with pytest.raises(NotFittedError):
optimizer.predict(np.ones((1, 3)))


@pytest.mark.parametrize("optimizer", [STLSQ(), SR3()])
@pytest.mark.parametrize("optimizer", [STLSQ(), SR3(), SBR()])
def test_complexity_not_fitted(optimizer, data_derivative_2d):
with pytest.raises(NotFittedError):
optimizer.complexity
Expand Down Expand Up @@ -325,6 +328,39 @@ def test_sindypi_fit(params):
assert np.shape(opt.coef_) == (10, 10)


@pytest.mark.parametrize(
"params",
[
dict(sparsity_coef_tau0=0),
dict(sparsity_coef_tau0=-1),
dict(slab_shape_nu=0),
dict(slab_shape_nu=-1),
dict(slab_shape_s=0),
dict(slab_shape_s=-1),
dict(noise_hyper_lambda=0),
dict(noise_hyper_lambda=-1),
dict(num_warmup=0.5),
dict(num_warmup=-1),
dict(num_samples=0.5),
dict(num_samples=-1),
],
)
def test_sbr_bad_parameters(params):
with pytest.raises(ValueError):
SBR(**params)


def test_sbr_fit(data_lorenz):
x, t = data_lorenz
opt = SBR(num_warmup=10, num_samples=10)
sindy = SINDy(optimizer=opt).fit(x=x, t=t)
assert hasattr(sindy.optimizer, "mcmc_")

expected_names = ["beta", "c_sq", "lambda", "sigma", "tau"]
result_names = sindy.optimizer.mcmc_.get_samples().keys()
assert all(expected in result_names for expected in expected_names)


@pytest.mark.parametrize(
"params",
[
Expand Down Expand Up @@ -667,6 +703,7 @@ def test_constrained_sr3_prox_functions(data_derivative_1d, thresholder):
(StableLinearSR3, {"trimming_fraction": 0.1}),
(SINDyPI, {}),
(MIOSR, {"constraint_lhs": [1]}),
(SBR, {}),
),
)
def test_illegal_unbias(data_derivative_1d, opt_cls, opt_args):
Expand Down Expand Up @@ -988,6 +1025,7 @@ def test_inequality_constraints_reqs():
StableLinearSR3,
TrappingSR3,
MIOSR,
SBR,
],
)
def test_normalize_columns(data_derivative_1d, optimizer):
Expand Down Expand Up @@ -1133,10 +1171,17 @@ def test_remove_and_decrement():
np.testing.assert_array_equal(expected, result)


def test_pickle(data_lorenz):
@pytest.mark.parametrize(
("opt_cls", "opt_args"),
(
(MIOSR, {"target_sparsity": 7}),
(SBR, {"num_warmup": 10, "num_samples": 10}),
),
)
def test_pickle(data_lorenz, opt_cls, opt_args):
x, t = data_lorenz
y = PolynomialLibrary(degree=2).fit_transform(x)
opt = MIOSR(target_sparsity=7).fit(x, y)
opt = opt_cls(**opt_args).fit(x, y)
expected = opt.coef_
new_opt = pickle.loads(pickle.dumps(opt))
result = new_opt.coef_
Expand Down
Loading