-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathslides_update.qmd
543 lines (306 loc) · 18.8 KB
/
slides_update.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
---
title: "Mediation Analyses"
format: revealjs
slide-number: c/t # current/total format
editor: visual
---
## Welcome to the workshop on Mediation Analysis🎉
```{r}
knitr::include_graphics("images/cover.png")
```
#### Instructors: Daniel Witte, Jie Zhang
#### ***Steno Diabetes Center Aarhus, Aarhus University, Denmark***
## Outlines 📌
## In this course, you will:
- **Understand the fundamentals of mediation analysis**\
*Apply what is mediation analysis? When and why should you use this?*
- **Master the theoretical framework**\
*Understand traditional mediation concepts; Learn principles of causal mediation analysis*
- **Conduct practical analyses using R software**\
*Apply appropriate statistical methods to perform mediation analysis*
- **Interpret and communicate results**\
*Analyze outputs from mediation analyses and draw meaningful conclusions from results*
## What is mediation?
Mediation analysis is the study of pathways and mechanisms through which an *exposure* or *intervention* impacts an outcome.
In clinical and epidemiological research, the primary focus is often on determining whether a specific intervention has an effect on a disease or health outcome. Once this effect is established, the next natural question is to explore the "black box"—the underlying mechanisms that explain how the intervention (or exposure) leads to the observed outcome. As we are not only interested in whether an intervention works, but *how* it works.
## Mediation analyses
The techniques to assess the relative magnitude of the direct and indirect effects is referred to as 'mediation analysis'.
## Mediation analyses
The purpose of mediation analysis is to determine if the effect of a treatment (A) on an outcome (Y) can be explained by a third mediating variable (M). Thus, mediation analysis not only answers whether two variables are related, but also **why and how**.
## Motivation for mediation analysis
1. **Explanation and Understanding**
2. **Confirmation and Refutation of Theory**
3. **Refining Interventions**
## Mediation analysis
Mediation analysis is becoming more popular. [Fig. 1](https://pmc.ncbi.nlm.nih.gov/articles/PMC8496983/#F1) shows that both the number of entries in Google Scholar and the number of peer-reviewed articles in PsycINFO that have “mediation analysis” in the title or text have been growing exponentially.
```{r}
#| label: fig-mediation-analysis
#| fig-cap: "Trend Mediation Analysis"
#| fig-width: 8
#| fig-height: 6
#| fig-align: "center"
knitr::include_graphics("images/trend_mediation.jpg")
```
## Mediation
Overall relationship between A and Y:
```{r, out.width='80%', out.height='50%'}
knitr::include_graphics("images/a-y.png")
```
Relationship between A and Y through M:
```{r,out.width='80%', out.height='50%'}
knitr::include_graphics("images/a-m-y.png")
```
## Mediation
How much does BMI explain the relation between physical activity and the risk of diabetes?
```{r}
knitr::include_graphics("images/example1.jpg")
```
How much does inflammation explain the relation between red meat and the risk of diabetes?
```{r}
knitr::include_graphics("images/example2.png")
```
## Traditional approaches for mediation analysis
The two traditional approaches to mediation analysis are **the difference method** and **the product method** (also known as the Baron & Kenny-method).
## Method 1: Baron & Kenny (the product method)
The following criteria need to be satisfied for a variable to be considered a mediator:
1. The exposure should be associated with the mediator.
2. The mediator should be associated with the outcome.
3. The exposure should be associated with the outcome.
4. When controlling for the mediator, the association between the exposure and outcome should be reduced (up to debate).
## Baron&Kenny-four steps
##### **STEP 1:** **test association between exposure and outcome (c)**
##### **STEP 2:** test association between exposure and mediator (a)
##### **STEP 3:** test association between mediator and outcome, controlling for exposure (b)
##### **STEP 4:** mediation is shown if association between X and Y is reduced to non-significance when M is controlled (c')
```{r, out.width='60%', out.height='60%'}
knitr::include_graphics("images/basic.png")
```
## Method 2: Difference approach
Total effect: regress A on X
Direct effect: regress Y on A and M
If a mediation effect exists, the effect of A on Y will be attenuated when M is included in the regression, indicating the effect of A on Y goes through M. If the effect of A on Y completely disappears, M fully mediates between A and Y (full mediation). If the effect of A on Y still exists, but in a smaller magnitude, M partially mediates between A and Y (partial mediation).
## Build models
Mediator-model:
$E(M|A=a, C=c) = \beta_0 + \beta_1a + \beta_2c$ (1.1)
Outcome model with adjustment for mediator:
$E(Y|A=a, M=m, C=c) = \theta_0 + \theta_1a + \theta_2m + \theta4c$ (1.2)
Outcome model without adjustment for mediator:
$E(Y|A=a, C=c) = \theta_0' + \theta_1'a + \theta4'c$ (1.3)
## Summary
- **Total effect** = $\theta_1'$, the total effect of the independent variable on the dependent variable.
- **Direct effect** = $\theta_1$, the effect of the independent variable on the dependent variable that is not mediated by the mediator.
- **Mediation effect** = $\beta_1 \cdot \theta_2$ (product method).
- **Mediation effect** = $\theta_1' - \theta_1$ (difference method).
*The terms mediation effect and indirect effect are used synonymously*.
## Product method VS Difference method
The algebraic equivalence of the indirect effect using the product method and the difference method will coincide for a continuous outcome on the difference scale. However, the two methods diverge when using a binary outcome and logistic regression.
## Limitations of the traditional approach
- Non-linearity
- Interactions
- Multiple mediators
## Further limitations of the traditional approach
⚠️ Unmeasured confounding of the Mediator-Outcome path
This assumption can be violated in both observational studies as well as RCTs because while the exposure can sometimes be randomized, it is often not the case that both exposure and mediator are randomized.
## Problem 1-Intermediate confounding
```{r}
knitr::include_graphics("images/intermediate_eg.jpg")
```
## Problem 1-Intermediate confounding
- In an **RCT study**, pregnant women are randomized to receive a **lifestyle intervention (A)** aimed at weight loss. While **A (lifestyle intervention)** can be randomized, **M (birth weight)** cannot, as it is a pre-existing condition that results from the pregnancy.
- Gestational diabetes is a descent of A (lifestyle intervention on pregnant women), and a cause of Y (child adiposity), you cannot condition on it because it is a mediator of A-Y(lifestyle intervention-child adiposity);
- Gestational diabetes is also a confounder of M-Y relationship (birth weight-child adiposity), it biases the M-Y path if you do not condition on it.
## Problem 2-Collider bias
```{r}
knitr::include_graphics("images/intermediate_eg2.jpg")
```
## Problem 2-Collider bias
In observational studies, the situation might get even more complex.
- C1, C2 represent a series of confounders on the pathways.
- Notably, in the presence of **C2**, **gestational diabetes (L)** acts as a **collider** on the pathway **A → L ← C2 → Y** (maternal pre-pregnancy BMI → gestational diabetes **←** C2 → child adiposity). If we adjust for **gestational diabetes (L)**, it will **open a backdoor path**, introducing bias in the estimation of both **direct and indirect effects**. This could distort the causal interpretation of mediation.
## When can we use the traditional approach
When fulfill the criteria, simple tools like regression can be used to estimate a causal mediation effect:
- no unmeasured confounding
- no exposure-mediator interaction
- linear relationship
- rare binary outcome
# Introduction to causal mediation analysis
## Concept of Cause
### What is a cause?
Causation is everywhere in life.
If I do A, then Y will happen.
If I press the switch, the light will come on.
```{r}
knitr::include_graphics("images/bulb.png")
```
## What is cause inference?
#### Inferring the effects of any treatment/intervention/risk factor/policy
Examples:
- Effects of treatment on a disease
- Effect of intervention (e.g. obesity) on risk of death
- Effect of poverty on mental health
## Individual causal effect
When investigating health outcomes, we would ideally want to know if **you** do A, then Y will happen. We could have a specific question:
> Will eating more red meat give me higher blood glucose in 1 year?
To answer this question, we would ideally have you consume more red meat over 1 year and measure your blood glucose levels. Then, we would turn back time, and make you eat something else over 1 year and then measure your blood glucose levels again. If there is a **difference** between your two outcomes, then we say there is a causal effect.
## Individual causal effect
> Will eating more red meat give me higher blood glucose in 1 year?
```{r}
knitr::include_graphics("images/causal_eg1.jpg")
```
## Individual causal effect
> Will eating more red meat give me higher blood glucose in 1 year?
::: callout-note
A = received treatment/intervention/exposure (e.g., 1 = intervention, 0 = no intervention)
Y = observed outcome (e.g., 1 = developed the outcome, 0 = no outcome)
$Y^{a=1}$ = Counterfactual outcome under treatment a =1(i.e., the outcome had everyone, counter to the fact, received treatment a = 1)
$Y^{a=0}$ = Counterfactual outcome under treatment a=0 (i.e., the outcome had everyone, counter to the fact, received treatment a = 0)
:::
## Individual causal effect
> Will eating more red meat give me higher blood glucose in 1 year?
```{r}
knitr::include_graphics("images/causal_eg2.jpg")
```
But we can never do this in the real world. We could Only observe one of Y(0) or Y(1) the other is counterfactual.
## Average causal effect
Instead, we can perform a randomized controlled trial. We now ask a slightly different question:
> Will eating more red meat give adults higher blood glucose in 1 year?
We can randomely assigning one group to consume more red meat and the other group to consume more of something else over 1 year. Then we compare the **average** blood glucose levels after 1 year in each of the groups. If there is a difference, we could say there is an average causal effect.
## Average causal effect
$E[Y^{a=1}]$ the average counterfactual outcome, had all subjects in the population received treatment a = 1.
$E[Y^{a=0}]$ the average counterfactual outcome, had all subjects in the population received no treatment a = 0.
Average treatment effect (ATE) : $E[Y^{a=1}]$ -$E[Y^{a=0}]$
## Definition of a causal effect
More formally we can now define a causal effect:
$E[Y^{a=1} = 1] - E[Y^{a=0} = 1] \ne 0$
## **Association vs causation**
What makes it complicated to estimate a causal effect is that we cannot observe the outcome under different treatments.
When we only have a subset of the outcomes, we have an association. This is illustrated in @fig-causation-association.
\
```{r}
#| label: fig-causation-association
#| fig-cap: "Relationship between causation and association"
knitr::include_graphics("images/causation-association.png")
```
## [Key As](images/filename.png)sumptions
If we want to infer a causal effect (i.e., what would have happened, had everyone done A=1 vs A=0), we need three assumptions to be fulfilled:
- Exchangeability
- Consistency
- Positivity
## Causal mediation analysis
Causal mediation analyses help you establish whether treatment *causes* the outcome because it *causes* the mediator.
To do this, causal mediation seek to understand how the paths behave under circumstances different from the observed circumstances (e.g., interventions).
## Why use causal mediation analysis?
Causal mediation analysis is an extension of the traditional approach by:
- outlining all confounding assumptions needed
- handling non-linearity and interaction
- clearly defining estimands of interest
## Causal approach brings alternative parameters
- Standard Mediation Analysis
- Total Effect
- Direct Effect
- Indirect Effect
- Counterfactual-based Causal Mediation Analysis
- Controlled Direct Effect (CDE(m))
- Natural Direct Effect (NDE)
- Natural Indirect Effect (NIE)
## Key Strengths
- can provide causal estimates even when mediator or outcome are binary
- can deal with interaction between exposure and mediator
- can deal with intermediate confounding
## Non-linearity and interactions
Neither the product method nor the difference method can take interaction and non-linearity into account.
Causal mediation analysis can take this into account. It can do this using a regression-based approach. It can also use other causal inference analysis methods such as g-computation, that are different from the traditional regression approach in that:
- it builds a causal model. This model can include non-linearity and interactions
- then artificially manipulate the data to set the treatment **and** the mediator to certain values
- then predict the outcome using the causal model and contrast the outcomes
## Defining estimands
Imagine we have a hypothetical randomized controlled trial where we give participants treatment or no treatment on a specific outcome Y.
$Y^{a=1} - Y^{a=0}$
For mediation, we are also interested in the effect of a mediator on this pathway. Now image that we also intervene on the mediator in a new hypothetical randomized controlled trial.
$Y^{m=1} - Y^{m=0}$
Now consider if we, in the same trial, could intervene on both because we are interested in whether treatment *causes* the outcome because it *causes* the mediator.
## Notation
$Y^a$ = a subject's outcome if treatment A were set, possible contrary to fact, to a
$M^a$ = a subject's value of the mediator if the exposure A were set to the value of a
$Y^{a,m}$ = a subject's outcome if A were set to a and M were set to m
$Y^{a,M_a}$ = a subject's outcome if A were set to a and M were set the value m would have had had a been set to a. Note, this is a nested counterfactual
## Effect Decomposition
We can now define these estimands:
- the controlled direct effect (CDE)
- natural direct effect (NDE)
- natural indirect effect (NIE)
# Estimation of effects using causal mediation analysis
## Natural direct effect
The NDE is how much the outcome would change if the treatment a, was set at its natural value versus 0 but for each individual the mediator was kept at the level it would have taken, for that individual, in the absence of the exposure.
## Controlled direct effect
For the controlled direct effect we set m to a specific value. The CDE answer the question, what would be the effect of A on Y, when fixing M at a specific value for everyone in the population.
## Natural indirect effect
The NIE is how much the outcome would change on average if the treatment was fixed at level a but the mediator was changed from the level it would take if a\* = 0 to the level it would take if a =1 .
Note that exposure has to have an effect on M otherwise this will be zero.
The NIE asks the question: the effect of exposure that 'would be prevented if the exposure did not cause the mediator' (i.e., the portion of the effect for which mediation is 'necessary')
This is often the effect we are interested in in biomedical research for questions regarding mediation.
## Proportion mediation
From this, we can calculate the proportion mediated.
$PM = \frac{TNIE}{TE}$
## Total effect
The total effect can be decomposed as:
$TE = PNDE + TNIE$
This is the overall effect of x on y.
###
## Further decomposition
```{r}
knitr::include_graphics("images/a-i-y.jpg")
```
## Effect Decomposition (Robins and Greenland)
When there are interaction and non-linearity, different ways of accounting for the interaction:
- Pure natural direct effect (PNDE)-indirect effect due to mediator alone
- Total natural indirect effect (TNIE)-indirect effect due to mediator and its interaction with the exposure
- Pure natural indirect effect (PNIE)
- Total nature direct effect (TNDE)
TE = PNDE + TNIE = TNDE + PNIE
### Controlled direct effect
The effect of A on Y not mediated through M. Fixing the value of M to m.
$Y^{a=1,m}$ - $Y^{a=0,m}$
We intervene on $a$ but fix $m$ to a certain value. The CDE is how much the outcome would change on average if the mediator were fixed at level m uniformly in the population but the treatment were changed from 0 to 1.
This could be relevant in the context of a change in a policy that impacted the mediator for everyone. For instance, if air pollution was a mediator between physical activity and cardiovascular disease risk. If a new policy would change the level of air pollution for all while we implement an intervention to increase biking in the city.
This effect is not used that often. But can be highly relevant in some situations.
## (Pure) Natural direct effect
The effect that would remain, if we were to disable the pathway from exposure to mediator.
$Y^{a=1,M_a=0}$ - $Y^{a=0,M_a=0}$
The PNDE is how much the outcome would change if the exposure was set at a = 1 versus a\* = 0 but for each individual the mediator was kept at the level it would have taken, for that individual, in the absence of the exposure.
Note that the word "natural" refers to the nested counterfactual, the level the mediator would have taken in the absence of exposure. What it would naturally have been in the absence of exposure.
## Total natural direct effect
$Y^{a=1,M_a=1}$ - $Y^{a=0,M_a=1}$
Note, different from above in that the mediator is kept at the level it would have taken in the **presence** of the exposure.
## (Total) Natural indirect effect
The effect of the mediator pathway.
$Y^{a=1,M_a=1}$ - $Y^{a=1,M_a=0}$
The NIE is how much the outcome would change on average if the exposure were fixed at level a = 1 but the mediator were changed from the level it would take if a\* = 0 to the level it would take if a = 1.
Note that exposure has to have an effect on M otherwise this will be zero.
## Pure natural indirect effect
$Y^{a=0,M_a=1}$ - $Y^{a=0,M_a=0}$
Note, this is different from the TNIE in that the exposure is set to no intervention.
## Interaction effects
$INT_{ref} = PNDE - CDE$
Mediation interaction:
$INT_{med} = TNIE - PNIE$
### Proportions
Proportion CDE:
$prop^{CDE} = CDE / TE$
Proportion $INT_{ref}$
$prop^{INT_{ref}} = INT_{ref} / TE$
Proportion $INT_{med}$
$prop^{INT_{med}} = INT_{med} / TE$
Proportion pure natural indirect effect:
$prop^{PNIE} = PNIE / TE$
Proportion mediated:
$PM = TNIE / TE$
Proportion attributable to interaction:
$INT = (INT_{ref} + INT_{med}) / TE$
Proportion eliminated:
$PE = (INT_{ref} + INT_{med} + PNIE) / TE$
##
```{r}
knitr::include_graphics("images/ladder.png")
```
## Code-along practice