-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathdecadal_age_distribution.Rmd
60 lines (44 loc) · 2.31 KB
/
decadal_age_distribution.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: "Decadal age distribution"
output: html_notebook
---
This will show how the age distribution within different decadal bands of birth cohorts varies in Scotland.
**Note to self**: The birth cohort decades used are 1905-1965, so 1965-1975 not a clear limitation for this analysis. (But could be if the analysis were to be replicated in another decade.)
```{r}
rm(list = ls())
library(tidyverse)
dta <- read_csv("hmd_explorer/data/hmd_data.csv")
dta %>%
filter(code == "GBR_SCO") %>%
filter(Age == 0) %>%
filter(gender != "Total") %>%
mutate(
birth_decade = cut(Year, breaks = seq(1905, 2015, by = 10))
) %>%
filter(!is.na(birth_decade)) %>%
group_by(gender, birth_decade) %>%
mutate(year_in_decade = 1:10) %>%
mutate(prop_popn_in_decade = num_population / sum(num_population)) %>%
ggplot(aes(x = factor(year_in_decade), y = prop_popn_in_decade, group = gender, fill = gender, colour = gender)) +
geom_histogram(stat= "identity") +
facet_wrap(~birth_decade)
```
From this it appears that the only notable asymmetry in the distribution of births within the birth cohort decade is for the 1965-74 birth cohort decade. (i.e. the decade where contraceptives became much more widely available and effective). This means that this birth cohort decade may be biased towards older ages, and so there's a risk of artefact, especially in comparison with the following decade. i.e. even if there's no genuine improvement in mortality/morbidity between the 1965-1974 and 1975-1984 decade, some apparent improvement would be expected from the differential age structure in the former decade alone.
Fortunately the last birth cohort considered is 1955-1965.
Let's now try to tabulate mean and sd for the above
```{r}
dta %>%
filter(code == "GBR_SCO") %>%
filter(Age == 0) %>%
filter(gender != "Total") %>%
mutate(
birth_decade = cut(Year, breaks = seq(1905, 2015, by = 10))
) %>%
filter(!is.na(birth_decade)) %>%
group_by(gender, birth_decade) %>%
mutate(year_in_decade = 1:10) %>%
mutate(prop_popn_in_decade = num_population / sum(num_population)) %>%
summarise(mean_in_decade = sum(num_population * year_in_decade) / sum(num_population)) %>%
ggplot(aes(x = birth_decade, y = mean_in_decade, colour = gender, group = gender)) +
geom_line() + geom_point() + coord_flip()
```