Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: include variable description as exported dataset as well as a function to get the required variables #48

Merged
merged 15 commits into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,30 @@ Package: osdc
Title: Open Source Diabetes Classifier for Danish Registers.
Version: 0.0.1.9000
Authors@R: c(
# cre = maintainer, even though it translates to "creator"
person(c("Luke", "William"), "Johnston", , "lwjohnst@gmail.com",
comment = c(ORCID = "0000-0003-4169-2616"), role = c("aut", "cre")),
person(c("Signe", "Kirk"), "Brødbæk", , "signekb@clin.au.dk", role = "aut"),
person(c("Anders", "Aasted"), "Isaksen", , "andaas@rm.dk", role = "aut"),
person("Steno Diabetes Center Aarhus", role = "cph"),
person("Aarhus University", role = "cph")
person(
c("Luke", "William"), "Johnston",
email = "lwjohnst@gmail.com",
role = c("aut", "cre"),
comment = c(ORCID = "0000-0003-4169-2616")
),
person(
c("Signe", "Kirk"), "Brødbæk",
email = "signekb@clin.au.dk",
role = "aut"
),
person(
c("Anders", "Aasted"), "Isaksen",
email = "andaas@rm.dk",
role = "aut"
),
person(
"Steno Diabetes Center Aarhus",
role = "cph"
),
person(
"Aarhus University",
role = "cph"
)
)
Description: This classifier first identifies a population of individuals
with any type of diabetes mellitus and then splits this population
Expand Down
25 changes: 23 additions & 2 deletions R/get-variables.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,31 @@
#' Get a list of the registers' abbreviations.
#'
#' @return A character string.
#' @export
#' @keywords internal
#'
#' @examples
#' get_register_abbrev()
get_register_abbrev <- function() {
unique(required_variables$register_abbrev)
unique(variable_description$register_abbrev)
}

#' Get a list of required variables from a specific register.
#'
#' @param register The abbreviation of the register name. See list of
#' abbreviations in [get_register_abbrev()].
#'
#' @return A character vector of variable names.
#' @keywords internal
#'
#' @examples
#' get_required_variables("bef")
get_required_variables <- function(register) {
if (!checkmate::test_scalar(register)) {
cli::cli_abort("You are giving too many registers, please give only one.")
}
checkmate::assert_choice(register, get_register_abbrev())
register <- rlang::arg_match(register, get_register_abbrev())
variable_description |>
dplyr::filter(.data$register_abbrev == register) |>
dplyr::pull(.data$variable_name)
}
Binary file removed R/sysdata.rda
Binary file not shown.
15 changes: 15 additions & 0 deletions R/variable-description.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#' Variables from registers and their descriptions that are required for the
#' OSDC algorithm.
#'
#' @format ## `variable_description`
#' A data frame with 39 rows and 6 columns:
#' \describe{
#' \item{register_name}{The official, full Danish name of the register.}
#' \item{register_abbrev}{The official abbreviation for the register.}
#' \item{variable_name}{The official name of the variable found in the register.}
#' \item{years_covered}{The years when the variable is available from.}
#' \item{danish_description}{The official description in Danish for the variable.}
#' \item{english_description}{The translated description in English for the variable.}
#' }
#' @source <https://www.dst.dk/extranet/forskningvariabellister/Oversigt%20over%20registre.html>
"variable_description"
16 changes: 11 additions & 5 deletions R/verify-variables.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
#' Verify that the dataset has the required variables for the algorithm.
#'
#' Use this function within an `if` condition inside a function to provide an
#' informative error message within the function used. This is done to make the
#' error message more informative to the location that the error actually
#' occurs, rather than within this function.
#'
#' @param data The dataset to check.
#' @param register The abbreviation of the register name. See list of
#' abbreviations in [get_register_abbrev()].
#' @inheritParams get_required_variables
#'
#' @return Either TRUE if the verification passes, or a character string if
#' there is an error.
Expand All @@ -16,10 +20,12 @@
#' verify_required_variables(example_bef_data, "bef")
verify_required_variables <- function(data, register) {
checkmate::assert_choice(register, get_register_abbrev())
expected_variables <- required_variables |>
dplyr::filter(.data$register_abbrev == register) |>
dplyr::pull(.data$variable_name)

expected_variables <- get_required_variables(register)

actual_variables <- names(data)

# TODO: Consider using/looking into rlang::try_fetch() to provide contextual error messages.
checkmate::check_names(
x = actual_variables,
must.include = expected_variables
Expand Down
14 changes: 11 additions & 3 deletions data-raw/variable-description.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,15 @@

library(tidyverse)

required_variables <- read_csv(here::here("data-raw/variable_description.csv")) |>
select(register_abbrev = raw_register_filename, variable_name)
variable_description <- here::here("data-raw/variable_description.csv") |>
read_csv() |>
select(
register_name,
register_abbrev = raw_register_filename,
variable_name,
years_covered,
danish_description,
english_description
)

usethis::use_data(required_variables, overwrite = TRUE, internal = TRUE)
usethis::use_data(variable_description, overwrite = TRUE)
Binary file added data/variable_description.rda
Binary file not shown.
12 changes: 12 additions & 0 deletions tests/testthat/test-get-variables.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
test_that("internal `get_` variable helper functions give correct output", {

# Should be character. Not sure if other tests are needed here.
expect_type(get_register_abbrev(), "character")
expect_type(get_required_variables("bef"), "character")

# Only able to use register ids that are real.
expect_error(get_required_variables("fake"))

# Only allows a vector of one.
expect_error(get_required_variables(c("bef", "atc")))
})
2 changes: 1 addition & 1 deletion tests/testthat/test-verify-variables.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,5 @@ test_that("the required variables are present in the dataset", {
expect_true(verify_required_variables(bef_complete_extra, "bef"))

# When it is a character output, it is a fail.
expect_character(verify_required_variables(bef_incomplete, "bef"))
expect_type(verify_required_variables(bef_incomplete, "bef"), "character")
})
45 changes: 45 additions & 0 deletions vignettes/design.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,48 @@ These are the guiding principles for this package:
is a data frame, the output is a data frame).
4. Functions have consistent naming based on their action.
5. Functions have limited additional arguments.

## Use cases

We make these assumptions on how this package will be used, based on our
experiences and expectations for use cases:

- Entirely used within the Denmark Statistics (DST) or the Danish
Health Authority's (SDS) servers, since that is where their data are
kept.
- Used by researchers within or affiliated with Danish research
institutions.
- Used specifically within a Danish register-based context.

Below is a set of "narratives" or "personas" with associated needs that
this package aims to fulfil:

- "As a researcher, ..."
- "... I want to determine which registers and variables to
request from DST and SDS, so that I am certain I will be able to
classify diabetes status of individuals in the registers."
- "... I want to easily and simply create a dataset that contains
data on diabetes status in my population, so that I can begin
conducting my research that involves persons with diabetes
without having to tinker with coding the correct algorithm to
classify them."
- "... I want to be informed early and in a clear way whether my
data fits with the required data type and values, so that I can
fix and correct these issues without having to do extensive
debugging of the code and/or data."

## Core functionality

This is the list of functionality we aim to have in the osdc package

1. Classify individuals type 1 and type 2 diabetes status and create a
data frame with that information.
2. Provide helper functions to check and process individual registers
for the variables required to enter into the classifier.
3. Provide a list of required variables and registers in order to
calculate diabetes status.
4. Provide validation helper functions to check that variables match
what is expected of the algorithm.
5. Provide a common and easily accessible standard for determining
diabetes status within the context of research using Danish
registers.