-
Notifications
You must be signed in to change notification settings - Fork 58
/
Copy pathGLMM.rmd
824 lines (587 loc) · 26.9 KB
/
GLMM.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
---
title: "Generalized Linear Mixed Models"
subtitle: "An Introduction"
author: "Joshua F. Wiley"
date: "`r Sys.Date()`"
output:
tufte::tufte_html: default
tufte::tufte_book: default
---
```{r setup}
options(digits = 2)
library(data.table)
library(lme4)
library(ggplot2)
library(visreg)
## read in the dataset
d <- readRDS("aces_daily_sim_processed.RDS")
```
# The Generalized Linear Model
We have already seen the linear model^[For example,
t-tests, Analysis of Variance, Pearson correlations,
and Linear Regression are all cases of linear models]
- $y = b_0 + b_1 * x_1 + ... + b_k * x_k + e$
The linear model is a *special case* of the **generalized linear
model** or **GLM**
The GLM extends the linear model to different types of
outcomes^[The probit model is also used for binary variables.
Unlike normally distributed outcomes, count outcomes take on
integer values (0, 1, 2, *not* 0.5 or 1.5)
and counts cannot be negative.]
- binary 0/1 variables (logistic regression)
- count variables (poisson regression)
- Many other types of variables
This lecture focuses on *logistic mixed effects models* for binary variables
------------------------------------------
## Notation and some Theory
We already learned about linear regression
- $y = b_0 + b_1 * x_1 + ... + b_k * x_k + e$
First, we simplify the notation a bit
- $\eta = b_0 + b_1 * x_1 + ... + b_k * x_k$
- $y = \eta + e$
For linear regression, we assumed a *normal* distribution
The normal distribution has two parameters^[Note the mean sometimes called the *location*
because it controls the location of the center of the distribution and the
standard deviation is sometimes called the *scale* because it controls the
scale or spread of the distribution]
- mean ($\mu$)
- standard deviation ($\sigma$)
- Normal distribution can be written: $N(\mu, \sigma)$
In linear regression, what we are really assuming is that
- $y \sim N(\eta, \sigma)$
Which is read, "Y is distributed as a normal distribution with mean
equal to $\eta$ and standard deviation $\sigma$".
$\sigma$ is typically estimated as standard deviation of the
residuals, $e$.
------------------------------------------
Any GLM has to be a linear model at some level, that is, this is
always true:
- $\eta = b_0 + b_1 * x_1 + ... + b_k * x_k$
The GLM generalizes linear regression by defining a *function* that
links or transforms $\eta$ to the outcome
The link function is always called $g()$. Likewise an inverse link
function exists that simple "undoes" whatever the link function
did. This is called $g()^{-1}$.
In normal linear regression, the link function and inverse link
function are just the identity function, that is *there is no
transformation*
For many other types of regression, including *logistic regression*
the link and inverse link functions do some transformation.
In terms of formulae, we define the expected value of $y$, its mean,
as $\mu$
- $E(y) = \mu =$ mean of $y$
Then
- $\eta = g(\mu)$
- $\mu = g^{-1}(\eta)$
For normal linear regression because the link and inverse link
functions are the identity function this works out to be
- $\eta = g(\mu) = \mu$
- $\mu = g^{-1}(\eta) = \eta$
The powerful concept of the GLM is that by using link functions, the
linear model
- $\eta = b_0 + b_1 * x_1 + ... + b_k * x_k$
always stays the same whether the outcome is continuous and normally
distributed or binary and not normally distributed or count or any
other type of variable and assumed distribution
This means in some sense, all GLMs are interpretted like linear
regression, so in a way you already know logistic regression!
Next, we explore the link functions for logistic regression that
make the transformation from a linear scale to a binary scale.
------------------------------------------
## Logistic Regression
Logistic regression is regression when the outcome only takes on two
values: 0 or 1.
Logistic regression is useful for many questions
- What predicts whether someone will have major depression or not?
- Does one treatment have a higher probability of patients *remitting*
from major depression than another treatment?
------------------------------------------
Linear regression will not work for binary outcomes for two reasons:
- First, when an outcome can only be 0 or 1, a straight line is a very
bad fit
- straight lines could predict a value of 1.4 or 2 or -0.3 when
those values are not possible
- Second, there is no way that a binary variable or its residuals will
follow a normal distribution: the normality assumption would be
violated
The GLM solves each of these problems separately
- Link functions transform the linear predicted value $\eta$ so that
it never goes below 0 and never goes about 1
- Rather than assume a normal distribution, the GLM uses a new
distribution, the Bernoulli distribution
- Whereas the Normal distribution had two parameters, mean and
standard deviation, the Bernoulli distribution only has one: the
probability that an event will occur
------------------------------------------
For logistic regression, the link function is defined as:
- $\eta = g(\mu) = ln\left(\frac{\mu}{1 - \mu}\right)$
That is called the *logit* function.
Here $\mu$ is the probability that the outcome will
be 1. Probabilities range from 0 to 1: an outcome cannot happen less
than never (probability of 0) or more than always (probability of 1).
Graphing $\mu$ against $\mu$ we get
```{r, fig.width = 6, fig.height = 4, fig.cap = "Figure 1. Probabilities against Probabilities"}
tmp <- data.frame(mu = seq(from = .01, to = .99, by = .01))
ggplot(tmp, aes(mu, mu)) +
geom_point()
```
The first part of the *logit* function:
- $\frac{\mu}{1 - \mu}$
unbounds $\mu$ on the right side, so that as it goes to 1,
transformed, it goes to infinity, shown graphically below
```{r, fig.width = 6, fig.height = 4, fig.cap = "Figure 2. Probabilities against right unbounded probabilities"}
ggplot(tmp, aes(mu, mu/(1 - mu))) +
geom_point()
```
The next part, the natural logarithm, unbounds it on the left side
because the log of 0 is negative infinity
- $ln\left(\frac{\mu}{1 - \mu}\right)$
Combined, the graph below shows the raw $\mu$ values against the
transformed values.
```{r, fig.width = 6, fig.height = 4, fig.cap = "Figure 3. Probabilities against unbounded probabilities (logit scale)"}
ggplot(tmp, aes(mu, log(mu/(1 - mu)))) +
geom_point()
```
In this way, the link function for logistic regression takes the
probability of the event occuring, which can only fall between 0 and
1, and transforms it to fall between negative infinity and positive
infinity
Now we have a continuous and unbounded outcome we can apply
a linear model to!
To go from predictions on the linear model back to the probability
scale, we use the inverse link function:
- $\mu = g^{-1}(\eta) = \frac{1}{1 + e^{-\eta}}$
That is, suppose we had the linear predicted values, $\eta$ shown
below graphically
```{r, fig.width = 6, fig.height = 4, fig.cap = "Figure 4. Linear predictor against linear predictor"}
tmp2 <- data.frame(eta = seq(from = -5, to = 5, by = .1))
ggplot(tmp2, aes(eta, eta)) +
geom_point()
```
Using the inverse link function, we transform them to again fall
between 0 and 1, like probabilities.
```{r, fig.width = 6, fig.height = 4, fig.cap = "Figure 5. Linear predictor (logit scale) against probabilities"}
ggplot(tmp2, aes(eta, 1/(1 + exp(-eta)))) +
geom_point()
```
------------------------------------------
## Logistic Regression Assumptions
Much like a normal distribution was assumed for normal linear
regression, for logistic regression a Bernoulli distribution is
assumed.
- $y \sim Bern(\mu)$
The Bernoulli only has the one parameter, the probability of the event
occuring.
In logistic regression the outcome is either 0 or 1, so outliers on
the outcome are not a concern
Outliers on predictors may still be a concern
One problem that sometimes arises is known as separation, this occurs
when some predictor perfectly predicts the outcome or separates the
outcome, most often when the outcome is rare or the sample size is
small.
- In a sample of 60 people where only 10 have major depression, if 80%
of the sample also have PTSD, it may happen that all cases of major
depression occurred in people with PTSD, so the percentage of
depression in those without PTSD is exactly 0%, which is
separation.
- The usual solution is to remove predictors or collapse groups in
these cases
Logistic regression still requires that the errors be independent
(i.e., individuals are independent, not dependent or repeated
measures). Generalized linear mixed models (GLMMs) and specifically
logistic mixed effects models relax the indepedence assumption.
Logistic regression also assumes that on the linke scale (logit scale)
there is a linear relationship.
Finally, logistic regression generally requires a large sample size.
There are no degrees of freedom, requires large enough sample that
parameters are distributed about normally (central limit theorem).
------------------------------------------
## Simple Logistic Regression (Example 1)
First, let's test whether women or men in the study are more likely to
be born in or outside of Australia. To begin with and avoid a repeated
measures dataset, we will make a dataset without repeated
data.^[Download the script to run all the examples in lecture here:
https://jwiley.github.io/MonashHonoursStatistics/GLMM_slides.rmd
Note that this is a new kind of file, but you can also open it in
RStudio. It combines text and has `R` code in "chunks" marked by three
back ticks and is called an `R` markdown file. `R` markdown is a very
flexible way to write text or notes in between pieces of `R` code. The
whole document can be "processed" or rendered by `R` and the result
has both the text, the code and the output, which is a great way to do
your analyses keeping enough records and notes of why you are doing
things or how to interpret them. To see more possibilities, go to:
https://rmarkdown.rstudio.com/ ]
```{r}
## create a new dataset with just 6 variables
## BSTRESS and BPosAff are average stress and PosAff
wd <- d[, .(UserID, BornAUS, Female,
Age, BSTRESS, BPosAff)]
## omit any rows (observations) that are missing
## any variable
wd <- na.omit(wd)
## exclude any rows with duplicated IDs
## this results in just one row per ID
wd <- wd[!duplicated(UserID)]
dim(wd)
```
There are `r nrow(wd)` people in this little dataset.
Below we ask `R` for frequency tables
```{r}
table(wd$Female)
table(wd$BornAUS)
```
To run the logistic regression, we need four pieces
- the `glm()` function for the logistic regression model
- A formula indicating our model of the form:
- dependent variable ~ predictor
- The name of the assumed distribution, `binomial()`
- The name of the dataset, "wd"
```{r}
m1 <- glm(BornAUS ~ Female,
family = binomial(),
data = wd)
summary(m1)
```
The output shows several things:
- The deviance residuals
- A coefficient table, like linear regression
- Estimate: familiar regression coefficients on linear scale
- Std. Error: standard error for the regression coefficients
- z value: takes the place of the t statistic in normal linear
regression, this is essentially a z-test
- Pr(>|z|): the p-value, based on the Z value
- The **residual deviance** degrees of freedom: sample size - number
parameters (here two: one for intercept, one for Female slope)
- The Akaike Information Criterion (AIC) which can be used to compare
different models.
With a simple model, we can easily convert from the logit scale
to the probability scale
Men are the reference group, captured by the intercept
- $\frac{1}{1 + e^{- (`r coef(m1)[1]`)}} = `r as.vector(plogis(coef(m1)[1]))`$
- Men have a `r as.vector(plogis(coef(m1)[1]))` probability of
being born in Australia
The coefficient for women, indicates that women have `r coef(m1)[2]` higher
log odds (the logit scale) of being born in Australia than do men
To calculate the probability that women will be born in Australia:
- $\frac{1}{1 + e^{- (`r coef(m1)[1]` + `r coef(m1)[2]`)}} = `r as.vector(plogis(sum(coef(m1))))`$
- Women have a `r as.vector(plogis(sum(coef(m1))))` probability of
being born in Australia
------------------------------------
Rather than interpret coefficients on the log odds (logit) scale,
people commonly exponentiate the coefficients so they are on the odds
scale.
- Women have $e^{`r coef(m1)[2]`} = `r exp(coef(m1)[2])`$ times the odds
of being born in Australia as do men
- This is called the odds ratio (OR)
An example summary of this analysis is:
"Women were more likely to be born in Australia than men, (OR = `r exp(coef(m1)[2])`, p
= .039)."
Another way of saying the same thing:
"Women had `r exp(coef(m1)[2])` times the odds of being born in
Australia as women (p = .039)."
## Simple Logistic Regression (Example 2)
Next we examine a continuous predictor, average level of positive
affect as a predictor of whether participants were born in Australia.
The figure below shows the distribution of average positive affect, "BPosAff"
```{r, fig.width = 6, fig.height=4}
ggplot(wd, aes(BPosAff)) +
geom_histogram(binwidth=1, colour = "black")
```
The analysis in `R` is very similar to the previous example:
```{r}
m2 <- glm(BornAUS ~ BPosAff,
family = binomial(),
data = wd)
summary(m2)
```
- The coefficient for average positive afffect indicates that for every
one unit change in average PosAff, people have `r coef(m2)[2]` higher log
odds of being born in Australia
- In odds ratios:
- For each additional unit higher average positive affect, participants have
`r exp(coef(m2)[2])` times the odds of being born in Australia (p = .037).
- One challenge with the odds ratio metric is that it is multiplicative
- If the baseline odds are 2, then 1.2 times the odds would be
2.4 (an absolute difference of 0.4)
- If the baseline odds are 4, then 1.2 times the odds would be 4.8
(an absolute difference of 0.8)
- Important to keep this interpretation in mind, odds ratios do not
directly indicate the probability of the event occuring, just how
many times higher the odds
We can calculate the specific probability of being born in Australia
as we did for sex
Probability of being born in Australia for participants with 0 average
positive affect
- $\frac{1}{1 + e^{- (`r coef(m2)[1]`)}} = `r as.vector(plogis(coef(m2)[1]))`$
- Participants with 0 average positive affect have a
`r as.vector(plogis(coef(m2)[1]))` probability of being born in Australia
Probability of being born in Australia for participants with a 4
average positive affect
- First calculate the log odds
- $`r coef(m2)[1]` + `r coef(m2)[2]` * 4 = `r coef(m2)[1] + 4 * coef(m2)[2]`$
- Then convert the log odds to probabilities
- $\frac{1}{1 + e^{- (`r coef(m2)[1] + 4 * coef(m2)[2]`)}} = `r as.vector(plogis(coef(m2)[1] + 4 * coef(m2)[2]))`$
- Participants with a 4 average positive affect have a
`r as.vector(plogis(coef(m2)[1] + 4 * coef(m2)[2]))` probability
of being born in Australia
## Multiple Logistic Regression (Example 3)
Include both sex and average positive affect as predictors.
```{r}
m3 <- glm(BornAUS ~ Female + BPosAff,
family = binomial(),
data = wd)
summary(m3)
```
- Results show that women are more likely to be born in Australia than
men, controlling for average positive affect
- In the odds ratio metric, women have
`r exp(coef(m3)[2])` times the odds of being born in Australia than
men, controlling for average positive affect.
- Note on the odds ratio metric, a 1 means no difference (1 times the
odds) and values less than 1 indicate lower odds, and values above 1
indicate higher odds
In this model, the probability of being born in Australia depends on
sex and average positive affect so it is not as easy to calculate
- to calculate need to pick values for *all predictors*
- probability for a woman with 2 average positive affect
- probability for a woman with 4 average positive affect, etc.
## Reporting Logistic Regression
Logistic regression results are often presented as
- Coefficient (standard error) on the log odds scale
- Odds ratio [95% confidence interval] on the odds ratio scale
We already saw the first example, to get odds ratios and confidence
intervals there are a couple steps
- Extract coefficients and confidence intervals
- Convert to odds ratios by exponentiating
```{r}
## odds ratios
exp(coef(m3))
## 95% confidence intervals for odds ratios
exp(confint.default(m3))
```
From these results, we could say: "Adjusting for sex, average positive
affect was associated with
`r exp(coef(m3)[3])`
[95% CI `r exp(confint.default(m3))[3, 1]`,
`r exp(confint.default(m3))[3, 2]`] of being born in Australia, p =
.024."
We can also plot the results from logistic regressions using the
`visreg()` function we have used for interactions.
```{r, fig.width = 6, fig.height=6}
visreg( m3,
xvar = "BPosAff",
by = "Female",
partial = FALSE, rug = FALSE,
overlay=TRUE)
```
By default, the graphs from `visreg()` are on the logit (log odds)
scale. This shows clearly how we are still dealing with a
(generalized) linear model. However, this is a very difficult scale to
interpret. To make graphs easier to interpret and present, it is
useful to present them on the probability scale. Although it is
tedious to calculate probabilities by hand, it is straight forward
with `visreg()` by specifying an additional argument, `scale =
"response"` to indicate that we want the results on the response
scale rather than the link scale.
One thing you will notice about this graph, albeit slight, is that the
difference in probabilities is not strictly linear, nor is the
difference between women and men the same distance at all levels of
average positive affect on the probability scale. Although logistic
models are still linear on the link scale, on the response
(probability) scale, they are not strictly linear.
```{r, fig.width = 6, fig.height = 6}
visreg( m3,
xvar = "BPosAff",
by = "Female",
partial = FALSE, rug = FALSE,
overlay=TRUE,
scale = "response")
```
# Generalized Linear Mixed Models
Generalized Linear Mixed Models (GLMMs) are just like regular linear
mixed models (LMMs) except that insetad of the outcome variable being
a continuous, normally distributed variable, we assume the outcome is
a binary variable (0/1) and follows a Bernoulli distribution and we
use the logit link function to map the outcome to our predictors on a
linear scale.
Otherwise, GLMMs and LMMs are basically the same. On the logit (link)
scale, we still assume a linear association of predictors and the
outcome. We still assume that the random effects follow a
(multivariate) normal distribution *on the link scale*.
For GLMMs, we cannot even approximate the degrees of freedom for the
model. Because of this, the standard approach is to simply assume that
the sample size is large enough that the t distribution approximates a
normal distribution. In GLMMs, p-values are based off of assuming
that: z = estimate / SE and then looking up p-values based on the Z
values. Finally, the Bernoulli / Binomial distribution does not have a
meaningful variance estimate so there is no residual variance
estimate. Consequently, things like the ICC are not readily
calculated (although if critical to your research, approximations do
exist).
One other note is that although LMMs can have some convergence issues,
estimating GLMMs is even harder for the computer. They typically are
slower to estimate and are more prone to convergence issues.
## GLMMs in `R`
GLMMs, specifically logistic mixed models are applicable any time you
have a repeatedly measured (or multilevel, such as children within
families) data where the outcome is binary.
In the daily diary study data we have used, we do not have a perfect
repeatedly measured, binary outcome, so we will make one by defining
positive affect scores above 3 (the scale midpoint) as "happy" (scored
1) and otherwise as "unhappy" (scored 0).
```{r}
## create a binary positive affect variable
## caled PosAffBin, defined as integer values for PosAff > 3
## where 1 = TRUE (PosAff > 3) and 0 = FALSE (PosAff <= 3)
d[, PosAffBin := as.integer(PosAff > 3)]
```
We fit GLMMs by adding a "g" to `lmer()` using the: `glmer()`
function. `glmer()` works almost identically to `lmer()`, except that
you also must specify the family, which is the assumed distribution
for the outcome. We use `binomial()`.
```{r}
glmm1 <- glmer(PosAffBin ~ 1 + (1 | UserID),
family = binomial(),
data = d)
summary(glmm1)
```
The output is fairly similar to output for LMMs.
- `R` first tells us how it fit the model (maximum likelihood based on
the Laplace approximation).
- Then it tells us the family and link function, logit.
- Then it tells us the formula for the model and the dataset.
- Information criterion (AIC, BIC), the log likelihood, deviance (-2 *
log likelihood) and the residual degrees of freedom, which are used
for the AIC and BIC.
- Scaled residuals are a sort of normalized residual on the logit
scale.
- Random effects are presented on the link (logit) scale and on this
scale are assumed to follow a normal distribution. Note that there
is no estimated residual variance, due to the outcome being assumed
to follow a Bernoulli / Binomial distribution.
- The number of observations and groups (here people) are reported.
- The fixed effects section is familiar with estimates (coefficients),
standard errors (Std. Error), z values in place of t values, and
p-values based off the z values, Pr(>|z|).
Below we visualise the BLUPs for the random intercept on the logit
(link) scale and the probability (response) scale.
Note that the first is assumed to be normally distributed (appears
somewhat violated in this instance). The second does not need to
follow (and will not) a normal distribution. The second distribution
on the probability scale shows the average probability that each
participant is "happy" (PosAff > 3) at any given survey.
```{r, fig.width = 6, fig.height = 4}
ggplot(coef(glmm1)$UserID, aes(`(Intercept)`)) +
geom_histogram() +
ggtitle("BLUPs on logit scale")
ggplot(coef(glmm1)$UserID, aes(plogis(`(Intercept)`))) +
geom_histogram()+
ggtitle("BLUPs on probability scale")
```
We can easily convert the fixed effect intercept estimate to a
probability for interpretation using the `plogis()` function in `R`.
However, GLMMs have a complexity. Because the logit function is not
linear, when you transform from the link scale to the probability
scale changes the results.
For example, the average of the individual estimates on the link scale
converted to probabilities, is not the same as the average of the
indivdiual estimates on the probability scale.
To see this, we calculate the BLUPs for the random effect intercept.
In one case, we take the average then back transform to probabilities
using the `plogis()` function. In the second example, we back
transform then average. The results are rather different.
**If you want the average probability, the second approach is the
correct one.**
```{r}
## random intercept BLUPs
blups <- coef(glmm1)$UserID[["(Intercept)"]]
## back transform average of the BLUPs
plogis(mean(blups))
## average back transformed BLUPs
## this is the correct way to estimate the average
## probability of being happy
mean(plogis(blups))
```
An implication of this difference is that although you can easily
generate predicted plots for the fixed effects of GLMMs, these are
only valid on the link scale.
As with regular LMMs, we can add predictors and include them as fixed
and random effects, if desired. This next model tests stress as a
fixed and random effects predictor of binary positive affect.
Note this takes a few seconds to run.
```{r}
glmm2 <- glmer(PosAffBin ~ 1 + STRESS + (1 + STRESS | UserID),
family = binomial(),
data = d)
summary(glmm2)
```
In the results, we can see the usual information. All of these results
can be interpretted basically the same as for LMMs *on the link
scale*. For example, we can interpret that when stress is 0,
the average log odds of being happy is `r fixef(glmm2)[1]` and that
about two thirds of people have a log odds of being happy between
`r fixef(glmm2)[1] - sqrt(VarCorr(glmm2)$UserID[1,1])` and
`r fixef(glmm2)[1] + sqrt(VarCorr(glmm2)$UserID[1,1])`, found by
taking the fixed effect intercept estimate plus or minus the random
effect intercept standard deviation estimate.
We can interpret the slope for stress as indicating that on average, a
one unit increase in stress is associated with a `r fixef(glmm2)[2]`
change in the log odds of being happy. We also can exponentiate the
estimate and say that the fixed effects revealed that each unit change
in stress was associated with `r exp(fixef(glmm2)[2])` times the odds
of being happy^[Here the odds ratio is calculated by taking: $e^b$ or
in `R` `exp(fixef(glmm2))`].
Plotting or visualization on the log odds scale for the fixed effects
also is easy.
```{r, fig.width = 6, fig.height = 4}
visreg(glmm2,
xvar = "STRESS",
partial=FALSE, rug=FALSE)
```
However, although it is equally easy to produce a plot on the
probability scale, these probabilities based solely on the fixed
effects do not have any easy interpretation. Specifially, the
probabilities from the fixed effects only are not the average
probability of being happy as a function of stress.
```{r, fig.width = 6, fig.height = 5}
visreg(glmm2,
xvar = "STRESS",
partial=FALSE, rug=FALSE,
scale = "response")
```
To get the actual average probability of being happy given stress, we
have to get predicted probabilities for each person in the dataset,
and average the results. In the resulting figure, the predicted
average probability never gets as low as when simply backtransforming
the predicted log odds based on only fixed effects. This highlights
how these two appraoches are not equivalent.
```{r, fig.width = 6, fig.height = 5}
## data for predictions, must set both stress and UserID
predictdata <- as.data.table(expand.grid(
STRESS = seq(0, 10, by = .1),
UserID = unique(d$UserID)))
## generate predicted probabilities
## type = "response" indicates predictions on the response
## (i.e., probability) scale
predictdata$yhat <- predict(glmm2, newdata = predictdata,
type = "response",
re.form = NULL)
## calculate the average probability across participants
## by stress level
predictdataavg <- predictdata[, .(yhat = mean(yhat)), by = STRESS]
ggplot(predictdataavg, aes(STRESS, yhat)) +
geom_line(size = 1)
```
One benefit of generating individual predictions is that we can plot
the individual predicted probabilities to get a sense of the
differences due to random effects. First, we do it on the logit scale,
then on the probability scale.
```{r, fig.width = 6, fig.height = 4}
## plot the indivdiual curves
ggplot(predictdata, aes(STRESS, qlogis(yhat), group = UserID)) +
geom_line(alpha = .2) +
theme_bw() + ggtitle("logit scale")
ggplot(predictdata, aes(STRESS, yhat, group = UserID)) +
geom_line(alpha = .2) +
theme_bw() + ggtitle("probability scale")
```