-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdata-sources.Rmd
93 lines (79 loc) · 3.41 KB
/
data-sources.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: "Data sources"
output: rmarkdown::html_vignette
bibliography: references.bib
csl: vancouver.csl
vignette: >
%\VignetteIndexEntry{Data sources}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
echo = FALSE,
results = "asis",
collapse = TRUE,
comment = "#>"
)
library(tidyverse)
library(osdc)
```
This document describes the sources of data needed by the OSDC algorithm
and gives a brief overview of each of these sources and how they might
look like. In addition, the final section contains information on how to
gain access to these data.
The algorithm uses these Danish registers as input data sources:
```{r, results='asis'}
osdc:::registers_as_md_table("Danish registers used in the OSDC algorithm.")
```
In a future revision, the algorithm can also use the Danish Medical
Birth Register to extend the period of time of valid inclusions further
back in time compared to what is possible using obstetric codes from the
National Patient Register.
## Expected data structure
This section describes how the data sources are expected to look like
when they are input into the OSDC algorithm. We try to mimic as much as
possible how the raw data looks like within Denmark Statistics. So since
registers are often stored on a per year basis, we don't expect a year
variable in the data itself. If you've processed the data so that it has
a year variable, you will likely need to do a split-apply-combine
approach when using the osdc package. We internally convert all variable
names to lower case, and so we present them here in lower case, but case
may vary between data sources (and even between years in the same data
source) in real data.
A small note about the National Patient Register. It contains several
tables and types of data. The algorithm uses only hospital diagnosis
data that contained in four registers, which are a pair of two related
registers used before (LPR2) and after (LPR3) 2019. So the LPR2 to LPR3
equivalents are `lpr_adm` to `kontakter` and `lpr_diag` to `diagnoser`.
Most of the variables have equivalents as well, except that while
`c_spec` is the LPR2 equivalent of `hovedspeciale_ans` in LPR3, the
specialty values in `hovedspeciale_ans` are coded as literal specialty
names and are different from the padded integer codes that `c_spec`
contains.
On Statistics Denmark, these tables are provided as a mix of separate
files for each calendar year prior to 2019 (in LPR2 format) and a single
file containing all the data from 2019 onward (LPR3 format). The two
tables can be joined with either the `recnum` variable (LPR2 data) or
the `dw_ek_kontakt` variable (LPR3 data).
```{r}
for (register in osdc:::get_register_abbrev()) {
print(glue::glue("### {osdc:::register_as_md_header(register)}"))
osdc:::variables_as_md_table(
register,
caption = glue::glue("Variables and their descriptions within the `{register}` register.")
) |>
print()
osdc:::register_data_as_md_table(
register,
caption = glue::glue("Simulated example of what the data looks like for the `{register}` register.")
) |>
print()
}
```
## Getting access to data
The above data is available through Statistics Denmark and the Danish
Health Data Authority. Researchers must be affiliated with an approved
research institute in Denmark and fees apply. Information on how to gain
access to data can be found at
<https://www.dst.dk/en/TilSalg/Forskningsservice>.