-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathscript.Rmd
458 lines (350 loc) · 16.4 KB
/
script.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
---
title: "Simulating Epidemiological Models with R"
author: "Noam Ross"
date: '2017-06-05'
output:
html_document:
keep_md: yes
knit_opts:
cache: yes
---
_This material is adapted from
[lessons from the 2012 EEID workshop](https://ms.mcmaster.ca/~bolker/eeid/)
by Helen J. Wearing, John M. Drake & Aaron A. King_
# Introduction - logistics
- I'm Noam. EHA diease ecologist, background in theoretical ecology. Do a mix of
applied and theoretical modeling, focus on dynamics.
- Here today to talk about epidemiological modeling.
- It's a big topic, and it's Friday at the end of a long week.
- So I'll aim to riff on some theoretical concepts. Then we'll try to dive
in as fast as possible into hands-on work.
- For our hands-on work we are going to use the R programming language.
It's not he easiest thing to use but it gives you the tools to adapt what
you learn to your own work.
- Now, many of you took my poll on what you knew about certain modeling
and programming topics. About half of you had some familiarity with R.
If you have, please raise your hand.
- I'd like you to pair up with someone who does not have their hand up. We're
practice "pair programming". Much of the lesson will be me live-coding on
the screen and you following along. Only one of you should follow-it doesn't
matter which.
- Secondly, you each have a green and red post-it. If you are having trouble,
if you've fallen behind, put up a red stickie like this.
- Laurie, who has some experience with this work, has agreed to act as my TA,
will dive in to help. When we do exercises, we'll use red to ask for help
and green to indicate you're done with a problem.
- Also, everything I type up here will be available to you as we go. If
you go to bit.ly/ehn2017, you can refresh the browser to update everything
that I'm doing.
# So, let's dive in!
In this lesson, I introduce some model structures that are useful for
representing the **temporal dynamics** of disease transmission. These are simple
models for representing how change in time represents interactions.
the models we look at here are fundamental and relatively general and
therefore readily extended for your own purposes in the future. Third, we
introduce some numerical tools that are useful for studying epidemiological
systems.
# Why do these types of models?
- These models are abstract representations of simplified systems. As the
saying goes, "All models are wrong, some models are useful." So, how
are these very simple models _useful_?
- Models of this sort tend not to be very _predictive_. For models, you
have to balance generality (representing transferable mechanisms),
accuracy (detailed representation of the system), and preditiveness.
- These simple models are _general_, and this is their advantage.
- They are readily extended and modified for your own purposes in the future
- Their simplicity lets us explore the implications of a small number of
key mechanisms. The insights we gain from studying a model help us understand
of a set of interactions gives rise to certain behaviors, and then we
us these to make inferences about how real-world systems proceed.
- We can apply our qualitative insights from them across multple systems.
- They are also good building blocks to extend to develop models for your
own purposes in the future.
# Chain binomial model
We begin by developing an intuitive understanding of the mechanics of the
transmission process by considering a simple model of an epidemic:
the **chain binomial** model.
This is one of the very first epidemic models. It is most commonly attributed
Lowell Reed and Wade Hamton Frost's work in the 1920s, and hence called the Reed-Frost model, but
a formulation of it was actually published in the 1880s by a physician-mathematician
names Piotr Dmitirievich En'ko, in Russia.
What are some properties of this model?
- _Compartmental_ model
- _Discrete_ in time and in population
- _Non-overlapping_ generations
- _Stochastic_
We designate the number of susceptible individuals at a given time as $I_t$,
and the infected individuals as $S_t$
This is a very simple model with types of organisms - susceptible and infected,
and two parameters, the contact rate $\beta$, and the number of susceptibles,
$S_0$.
df
This model stipulates that the epidemic evolves according to discrete
generations. In each generation, new infections are binomially distributed
with the number of trials equal to the number of susceptibles, $S_t$, and
probability of infection, $p=1-\exp(-\beta - I_t)$. In probability notation:
$$I_{t+1} \sim \mathrm{binom}(S_{t},1-\exp(-\beta\,I_{t}))$$
Susceptibles are then depleted by the number of these infections
$$S_{t+1} = S_{t}-I_{t+1}$$
Some of you may have taken statistics or probability courses and learned that the binomial random variable is
the number of independent "successes" in a sequence of weighted coin
tosses. The analogy here is that we toss a weighted coin (with probability of
heads $p=1-\exp(-\beta\,I_t)$) for each susceptible individual in the
population. If the weighted coin does in fact come up heads then the
susceptible individual becomes infected. Otherwise, it stays susceptible and
we move on to the next susceptible individual.
```{r}
state_0 <- c(S=2000, I=1) # do three things here. I create an object (state_0), I make an object with two variables in it, and
state_0
my_parms <- c(beta=0.001)
my_parms
```
Now we have to express that equation in code. To do so we write a
_function_. A function is a reusable bit of code that takes something
as an input, and returns something as an output. We'll write a function
that executes one step of this binomial chain.
```{r }
binomial_chain_step = function(state, parameters) {
S = state["S"]
I = state["I"]
beta = parameters["beta"]
I_next = rbinom(n = 1, size = S, prob = 1 - exp(-beta*I))
S_next = S - I_next
state_next = c(S = S_next, I = I_next)
return(state_next)
}
```
Now let's run this function, using our state at parameters as inputs.
```{r}
binomial_chain_step(state_0, my_parms)
```
We can do this several times and we see the randomness inherent in this model.
Now we write another function that runs this many times a row in a _loop_,
and store the results each time. It needs another input - how many steps to run?
```{r }
binomial_chain_simulate = function(state, parameters, n_steps) {
output = matrix(nrow = n_steps + 1, ncol = 3)
output[1,1] = 0
output[1,2] = state["S"]
output[1,3] = state["I"]
colnames(output) <- c("time", "S", "I")
for (step in 1:n_steps) {
output[step + 1, 1] = step
output[step + 1, 2:3] = binomial_chain_step(output[step, 2:3], parameters)
}
return(output)
}
```
```{r}
binomial_chain_simulate(state_0, my_parms, 10)
results <- binomial_chain_simulate(state_0, my_parms, 100)
```
We can plot these results.
```{r}
plot(I~time, data=results, type="l")
```
We'll now specify some parameters, simulate the model a few times, and plot
the results.
```{r }
n_sims <- 100
n_step <- 30
results_all <- list()
for (k in 1:n_sims) {
results_all[[k]] <- binomial_chain_simulate(state_0,my_parms, n_step)
}
```
```{r}
plot(c(0,20),c(0,400),type="n",xlab="time",ylab="I")
for (k in 1:n_sims) {
lines(I~time, data = results_all[[k]], type="l")
}
```
We can save this plot using the "Export" button on top of the Plot in RStudio.
>## Exercise
>
>Explore the dynamics of the system for different values of
>$\beta$, as well as different initial values of $S$ and $I$.
Although simple, the chain binomial model captures some key properties of the real biological process:
- demographic stochasticity - a type of process noise (to be contrasted with **measurement error**)
- item categorical class variables ($S$ and $I$ are integer-valued)
# Two-species model
We now extend this model to a *two species* version to look at spillover scenarios.
Now we have to classes each of susceptible and infected individuals: $S1, I1, S2, I2$
```{r}
state2_0 = c(S1 = 2000, I1 = 1, S2 = 2000, I2 = 0)
parameters2 = c(beta1 = 0.001, beta2 = 0.001, beta12 = 0.0001)
```
```{r }
binomial_chain_step2 = function(state, parameters) {
S1 = state["S1"]
S2 = state["S2"]
I1 = state["I1"]
I2 = state["I2"]
beta1 = parameters["beta1"]
beta2 = parameters["beta2"]
beta12 = parameters["beta12"]
I1_next = rbinom(n = 1, size = S1, prob= 1 -exp(-beta1 * I1))
S1_next = S1 - I1_next
I2_next = rbinom(n = 1, size = S2, prob = 1 - exp(-(beta2 * I2 + beta12 * I1)))
S2_next = S2 - I2_next
return(c(S1_next, I1_next, S2_next, I2_next))
}
```
```{r}
binomial_chain_step2(state2_0, parameters2)
```
Then we put these together in sequence to simulate an entire epidemic:
```{r}
binomial_chain_simulate2 = function(state, parameters, n_steps) {
output = matrix(nrow = n_steps + 1, ncol = 5)
colnames(output) <- c("time", "S1", "I1", "S2", "I2")
output[1,1] <- 0
output[1, 2:5] <- state
for (step in 1:n_steps) {
output[step + 1, 1] <- step
output[step +1, 2:5] <- binomial_chain_step2(output[step, 2:5], parameters)
}
return(output)
}
```
```{r}
binomial_chain_simulate2(state2_0, parameters2, 100)
results2 <- binomial_chain_simulate2(state2_0, parameters2, 100)
```
```{r}
plot(I1~time, data = results2, type="l", col="blue")
lines(I2~time, data= results2, type="l", col="red")
```
```{r }
n_sims = 1000
n_steps = 30
state2_0 = c(S1 = 2000, I1 = 1, S2 = 2000, I2 = 0)
parameters2 = c(beta1 = 0.0005, beta2 = 0.005, beta12 = 0.00001)
results_all2 = list()
for (k in 1:n_sims) {
results_all2[[k]] = binomial_chain_simulate2(state2_0, parameters2, n_steps)
}
```
```{r}
plot(c(0,30), c(0,1500), type="n", xlab="time", ylab="I")
for (k in 1:n_sims) {
lines(I1~time, data=results_all2[[k]], type="l", col="blue")
lines(I2~time, data=results_all2[[k]], type="l", col="red")
}
```
>## Challenge
>
> 1. Simulate a disease, such as rabies, where spillover is common from the reservoir host to
> the spillover host, but there is little spread in the spillover hosts.
> 2. Simulate a disease, such as influenza, where spillover from wild hosts
> is extremely rare, but the disease is highly contagious in spillover
> hosts
> 3. Make the disease in (2) less contagious in the wild host. What type of
> patterns do we see?
----
*This is as far as we at the EcoHealth Net Workshop*
# SIR Models
However, the chain binomial, like all models, is an approximation. One large assumption that it makes is that the generations are perfectly synchronized. For some diseases, this may not be such a bad approximation; for others, it might very well be.
There is an implicit parameter here - that the period of time that the diease
lasts is equal to 1. Both your model structure and parameters can contain assumptions.
Let's have a look at what can be done with models that don't make this assumption, i.e. generations of infection are not synchronized. In fact, for now, we'll take it one step further and assume that the change in the number of susceptible and infectious individuals in the population happens continuo=ntain assumptions.
The simplest place to start is with the classical $SIR$ model. This model divides the host population into three classes with respect to their infection status:
individuals are either Susceptible, Infected (and Infectious), or Recovered.
The model simply keeps track of how many individuals are in each class: individuals that leave one class must enter another.
The only exceptions, of course, are births and deaths.
The SIR model makes some assumptions: we're going to assume we have a large (technically infinitely large) population in which the effects of demographic stochasticity become negligible. Again,
this can be a useful simplifying assumption. There are options for stochastic,
overlapping generations models, of course, too.
The state variables change according to a
system of differential equations:
$$\begin{aligned}
\frac{dS}{dt} &= B-\lambda(I,t)\,S-\mu\,S\\
\frac{dI}{dt} &= \lambda(I,t)\,S-\gamma\,I-\mu\,I\\
\frac{dR}{dt} &= \gamma\,I-\mu\,R\\
\end{aligned}$$
Here, $B$ is the crude birth rate, $\mu$ is the per capita death rate, $N$ is the host population size, and $\gamma$ the recovery
rate. The term that makes this model interesting (and nonlinear) is the **force-of-infection**, represented by the function
$\lambda(I,t)$. We'll assume that it has the so-called
**frequency-dependent** form
$$\lambda(I,t) = \beta(t)\,\frac{I}{N}$$
So that the risk of infection faced by a susceptible individual is proportional to the fraction of the population that is infectious. Notice that we allow for the possibility of a contact rate, $\beta$, that varies in time. In this model, $S$, $I$, and $R$ may be interpreted either as proportions of the population (if $N=1$) or abundances (if $N>1$).
Like many epidemiological models, one can't solve the $SIR$ equations
explicitly. Rather, to find the trajectory of a continuous-time model such as the $SIR$, we must integrate those ordinary differential equations (ODEs) numerically. What we mean by this is that we use a computer algorithm to approximate the solution. In general, this can be a tricky business. Fortunately, this is a well studied problem in numerical analysis and (when the equations are smooth, well-behaved functions of a relatively small number of variables) standard numerical integration schemes are available to approximate the integral with arbitrary precision. Particularly, \R\ has very sophisticated ODE solving capabilities in the package \code{deSolve}. To use these algorithms we first load the package:
```{r }
# When running the script on your computer, you may need to to run
# install.packages("deSolve")
# to install this package before proceeding
library(deSolve)
```
The ODE solver needs to know the right-hand sides of the ODE.
We give it this information as a function:
```{r }
sir_diff_eqs <- function (time, state, parameters) {
## first extract the state variables
S <- state["S"]
I <- state["I"]
R <- state["R"]
N <- S+I+R
## now extract the parameters
beta <- parameters["beta"]
gamma <- parameters["gamma"]
mu <- parameters["mu"]
B <- parameters["B"]
## now code the model equations
dSdt <- B-beta*S*I/N-mu*S
dIdt <- beta*S*I/N-(mu+gamma)*I
dRdt <- gamma*I-mu*R
## combine results into a single vector
derivs <- list(c(dSdt,dIdt,dRdt))
## return result as a list!
return(derivs)
}
```
Notice that in this case we've assumed $\beta$ is constant.
We'll also write a function to calculate $R_0$.
```{r }
R0 <- function(parameters) {
R0 = parameters["beta"] /
(parameters["mu"]+parameters["gamma"])
}
```
We'll now define the times at which we want solutions, assign some
values to the parameters, and specify the **initial conditions**,
**i.e.**, the values of the state variables $S$, $I$, and $R$ at the
beginning of the simulation:
```{r }
times <- seq(0,30,by=1/120)
parameters <- c(B=1/70,mu=1/70,N=1,beta=400,gamma=365/14)
state_0 <- c(S=1-0.001-0.9,I=0.001,R=0.9)
```
Now we can simulate a model trajectory with the \code{ode} command:
```{r }
out <- ode(state_0,times,sir_diff_eqs,parameters)
```
and plot the results
```{r}
plot(R~time,data=out,type='l', ylim=c(0,1), ylab="Population")
lines(I~time,data=out,type='l',col='red'); par(new=TRUE)
lines(S~time,data=out,type='l', col='blue')
```
The "cycling" is even more apparent if we look at comparisons of
```{r}
plot(I~S,data=out) #,type='b',log='xy',yaxt='n',xlab='S',cex=0.5)
```
> ## Challenges
>
> Explore the dynamics of the system for different values of the
> $\beta$ and $B$ parameters by simulating and plotting trajectories
> as time series and in phase space (e.g., $I$ vs. $S$). How the $\beta$, $B$,
> and $R0$ related to the type of trajectories you get?
> ## Big Challenges (Pick one, if there is time, do as group)
> 1. What if you have a disease that has a latent period in the host before
> it starts infecting other hosts? Change the SIR model to an SEIR model
> with four compartments (Susceptible, EXPOSED, Infectious, Recovered) and
> plot the dynamics of all four.
> 2. What if there are seasonal dynamics to the disease? Change the model
> so that the value of either $\B$ cycles up and down, representing seasonality
> of reproduction, or $\beta$ cycles up and down, representing seasonality in
> contact. (Hint: there are `sin()` and `cos()` functions, and `time` is
> a variable in your differential equation function already). Plot several
> parameterizations of this model and compare to the version already created.