-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdesign.Rmd
167 lines (134 loc) · 6.16 KB
/
design.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: "Design"
output: rmarkdown::html_vignette
bibliography: references.bib
csl: vancouver.csl
vignette: >
%\VignetteIndexEntry{Design}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
```
## Principles
These are the guiding principles for this package:
1. Functionality is as agnostic to data format as possible (e.g. can be
used with SQL or Arrow connections, in a data.table format, or as a
data.frame).
2. Functions have consistent inputs and outputs (e.g. inputs and
outputs are the same, regardless of specific conditions).
3. Functions have predictable outputs based on inputs (e.g. if an input
is a data frame, the output is a data frame).
4. Functions have consistent naming based on their action.
5. Functions have limited additional arguments.
6. Casing of input variables (upper or lower case) is agnostic, all
internal variables are lower case, and output variables are lower
case.
## Use cases
We make these assumptions on how this package will be used, based on our
experiences and expectations for use cases:
- Entirely used within the Denmark Statistics (DST) or the Danish
Health Authority's (SDS) servers, since that is where their data are
kept.
- Used by researchers within or affiliated with Danish research
institutions.
- Used specifically within a Danish register-based context.
Below is a set of "narratives" or "personas" with associated needs that
this package aims to fulfil:
- "As a researcher, ..."
- "... I want to determine which registers and variables to
request from DST and SDS, so that I am certain I will be able to
classify diabetes status of individuals in the registers."
- "... I want to easily and simply create a dataset that contains
data on diabetes status in my population, so that I can begin
conducting my research that involves persons with diabetes
without having to tinker with coding the correct algorithm to
classify them."
- "... I want to be informed early and in a clear way whether my
data fits with the required data type and values, so that I can
fix and correct these issues without having to do extensive
debugging of the code and/or data."
## Core functionality
This is the list of functionality we aim to have in the osdc package
1. Classify individuals type 1 and type 2 diabetes status and create a
data frame with that information.
2. Provide helper functions to check and process individual registers
for the variables required to enter into the classifier.
3. Provide a list of required variables and registers in order to
calculate diabetes status.
4. Provide validation helper functions to check that variables match
what is expected of the algorithm.
5. Provide a common and easily accessible standard for determining
diabetes status within the context of research using Danish
registers.
## Classifier algorithm
A more complete description of the classifier is found in Anders Aasted
Isaksen's [PhD Thesis](https://aastedet.github.io/dissertation/) as well
as the validation paper [@Isaksen2023]. The description below is a brief
and concise version of those documents.
The algorithm for classifying individuals with diabetes is described
below. The overall output of this algorithm is first to classify those
with diabetes, then to further classify and check if the individuals
might have type 1 diabetes, otherwise classify as type 2 diabetes.
Initial **diabetes** classification is defined as the second occurrence
of any of the listed inclusion events. Wherever possible, all available
data for each event is used, except for the purchases of
glucose-lowering drugs, since data is only available from 1997 onwards.
Inclusion criteria are:
1. HbA1c measurement of ≥48 mmol/mol.
2. Hospital diagnosis of diabetes.
3. Diabetes-specific services received at podiatrist.
4. Purchase of glucose-lowering drugs.
Exclusions are:
1. HbA1c:
- Taken during pregnancies, as that could be a potential
gestational diabetes mellitus.
2. Drugs:
- Brand drugs for weight loss, e.g. *Saxenda.*.
- Purchases during pregnancies, as that is a potential treatment
for gestational diabetes mellitus.
- Metformin for women below age 40, as that could be a treatment
for polycystic ovary syndrome.
### Classifying type 2 diabetes
The inclusion criteria after being classified with non-specific diabetes
are any of:
- Any purchases of non-insulin glucose-lowering drugs.
- A hospital diagnosis of type 2 diabetes as the most recent
type-specific diabetes diagnosis.
The exclusion criteria are:
- Women that have purchased only metformin and have any diagnoses of
polycystic ovarian syndrome or have purchases of clomifene or
combination drugs containing antiandrogens and oestrogens.
- Only one recorded inclusion event.
- No recorded inclusion events in the last 10 years prior to the most
recent date available for the individual (index date).
### Classifying type 1 diabetes
The inclusion criteria after being classified with non-specific diabetes
are any of:
- Any purchases of insulin drugs.
- A hospital diagnosis of type 1 diabetes as the most recent
type-specific diabetes diagnosis.
The exclusion criteria are:
- Women with any diagnoses of gestational diabetes mellitus, who have
purchased glucose-lowering drugs only in the period from 280 days
prior to their first diagnosis of gestational diabetes mellitus
until 280 days after their last diagnosis.
- No purchases of glucose-lowering drugs or only one purchase and no
hospital records of type 1 diabetes.
- No insulin drug purchases in the last 10 years prior to the index
date.
## Data required from registers
The following is a list of the variables required from specific
registers in order for the package to classify diabetes status:
```{r, echo=FALSE}
variable_description |>
mutate(Register = paste0(register_name, "(", register_abbrev, ")")) |>
select(Register, Variable = variable_name) |>
knitr::kable()
```
## References