Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nimble #3

Merged
merged 5 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr_actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: pr_check
on:
# Triggers the workflow on push or pull request events but only for the master branch
pull_request:
types: [opened, reopened, edited]
branches: [main, master]

# A workflow run is made up of one or more jobs that can run sequentially or in parallel

Expand Down
127 changes: 126 additions & 1 deletion 01_nimble.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -452,7 +452,7 @@ make_MCMC_comparison_pages(res, modelName = 'code_neg_bin',dir = "/tmp/",

```

# Another example: multivariate normal with zeri inflated component
# Another example: multivariate normal with zero inflated component

We consider a multivariate Gaussian model with zero inflation, where the probability in the zero inflation can depend on the variable. The vector of observations $Y_i$ lives in $\mathbb{R}^p$, and its distribution is defined conditionnally on a (multivariate) Bernoulli $W_i \in {0,1}^p$ and a multivariate Gaussian variable $Z_i\in\mathbb{R}^p$:

Expand Down Expand Up @@ -616,3 +616,128 @@ ggs(my_MCMC) %>% filter(stringr::str_detect(Parameter, "prec")) %>%
#| label: true-precision
print(round(Omega,3))
```


# Zero inflated multivariate normal revisited

All the above code uses a workaround to avoid defining a new distribution in Nimble which is a ZI multivariate normal.

Let $Y$ be a random vector.
We denote $\mathbf{i}_*$ the set of indexes for which $Y$ is non zero, and $\mathbf{i}_0$ the set of indices for which $Y$ is 0.

For the zero inflated normal distribution, the p.d.f. is given by:
$$
p(y \vert \mu, \Sigma, \pi) = \varphi(y_{\mathbf{i}_*}\vert \mu_{\mathbf{i}_*}, \Sigma_{\mathbf{i}_*;\mathbf{i}_*})\prod_{j\in \mathbf{i_0}}\pi_j \prod_{k\in \mathbf{i}_*}(1 - \pi_k)\,,
$$
where $\varphi(y_{\mathbf{i}_*}\vert \mu_{\mathbf{i}_*}, \Sigma_{\mathbf{i}_*;\mathbf{i}_*})$ is the p.d.f. of a multivariate normal distribution restricted to the indexes where $y$ is non 0 (and to the corresponding parameters).
This distribution can be coded as a `nimbleFunction`.
It is important here that the first argument must correspond to the value at which the density is evaluated, and that the `log` argument is mandatory.

```{r}
#| label: dZInormal
dZInormal <- nimbleFunction(
run = function(x = double(1),
prob = double(1),
mu = double(1),
sigma = double(2),
log = integer(0, default = 0)){
returnType(double(0))
non_nul_indexes <- which(x!=0)
nul_indexes <- which(x == 0)
p_term <- sum(log(prob[nul_indexes])) +
sum(log(1 - prob[non_nul_indexes]))
mu_term <- 0
if(length(non_nul_indexes) > 0){
chol_mat <- chol(sigma[non_nul_indexes, non_nul_indexes])
restricted_x = x[non_nul_indexes]
restricted_mu = mu[non_nul_indexes]
mu_term <- dmnorm_chol(restricted_x,
restricted_mu,
chol_mat, prec_param = FALSE, log = TRUE)
}
log_output <- p_term + mu_term
if(log){
return(log_output)
}
else{
return(exp(log_output))
}
}
)
```

From my experience, it is better to register this distribution to avoid any confusion about its parameters, nature and dimension of output, etc...

```{r}
registerDistributions(list(
dZInormal = list(BUGSdist = "dZInormal(prob, mu, sigma)", # How to call in nimble
discrete = FALSE, # Distribution is not discrete
pqAvail = FALSE, # CDF and quantile function are not available
types = c('value = double(1)', # The random variable is a vector
'prob = double(1)', # a vector
'mu = double(1)', # vector
'sigma = double(2)')) # double(2) is a matrix
))
```

Note that it tells us that we do not provide any way of simulating from this distribution. This is not a problem to run a standard MCMC.
Moreover, providing CDF and quantile function could allow some gain in efficiency (at least, that what the help pages says).

The code for the model is then direct:

```{r}
my_code <- nimbleCode({
for(j in 1:p){
prob[j] ~ dunif(0, 1)
mu[j] ~ dnorm(0, 1)
}
sigma[1:p, 1:p] ~ dwish(Ip[1:p, 1:p], p)
for(i in 1:n){
# I put my custom distribution
Y[i, 1:p] ~ dZInormal(prob[1:p], mu[1:p], sigma[1:p, 1:p])
}
})
```

And now, let's run it!


```{r}
#| label: fit-ZI-normal
#| cache: true
# Generating data
set.seed(123)
n_obs <- 1000
n_species <- 3
# Values
U <- mvtnorm::rmvnorm(n = n_obs,
mean = 0:(n_species - 1),
sigma = diag(1, n_species))
# Mask (matrix of zeros and ones)
Z <- rbinom(n = n_obs * n_species, size = 1, prob = .8) %>%
matrix(nrow = n_obs)
# Observations
Y <- round(U * Z, 9)



my_model <- nimbleModel(code = my_code,
constants = list(p = n_species,
n = n_obs,
Ip = diag(1, n_species)),
data = list(Y = Y),
inits = list(mu = rep(0, n_species),
prob = rep(0.5, n_species),
sigma = diag(1, n_species)))
results <- nimbleMCMC(my_model,
samplesAsCodaMCMC = TRUE,
nchains = 2, niter = 10000,
nburnin = 1000, thin = 10)

ggs(results) %>%
ggplot(aes(x = Iteration, y = value, color = factor(Chain))) +
facet_wrap(~Parameter, scales = "free") +
geom_line()
```


100 changes: 0 additions & 100 deletions script_dZInormal.R

This file was deleted.

Loading