Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_vpts_aloft() #579

Merged
merged 66 commits into from
May 25, 2023
Merged
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
eacf5fc
Create download_vpts_aloft.R
peterdesmet May 16, 2023
2d38e3e
use " not '
PietrH May 17, 2023
db50fa5
limit to 80 cols
PietrH May 17, 2023
ec534e6
init list_vpts_aloft() + test
PietrH May 17, 2023
30b48c1
Merge pull request #576 from adokter/master
PietrH May 17, 2023
78bbfda
Update NAMESPACE
PietrH May 17, 2023
05bf9e8
Check source
PietrH May 17, 2023
bb0ff4e
Check output type
PietrH May 17, 2023
aa52213
return vector of URLs
PietrH May 17, 2023
c1cfa8f
comment out unused code, code sections
PietrH May 17, 2023
c540842
devtools::document()
PietrH May 17, 2023
ae0c348
add test headers
PietrH May 17, 2023
da630d2
remove download_vpts_aloft()
PietrH May 17, 2023
40c25f6
establish radars are within controlled list
PietrH May 17, 2023
a9b0459
remove download_vpts_aloft() from NAMESPACE
PietrH May 17, 2023
91a7c41
add return documentation
PietrH May 17, 2023
787fa60
remove download_vpts_aloft() documentation
PietrH May 17, 2023
d1d90fa
add tests regarding dates
PietrH May 22, 2023
71908ab
explicit data masking to avoid R CMD CHECK warnings
PietrH May 22, 2023
507ca60
Merge branch 'master' into download_vpts_aloft
PietrH May 22, 2023
a0f021e
handle missing dates
PietrH May 22, 2023
7d88d9e
warn when no data for radar station within filter
PietrH May 22, 2023
3d50ce9
test format and source parameters
PietrH May 22, 2023
b2d42c3
Check if aws.s3 is installed
PietrH May 22, 2023
da4595a
Handle cases where none or not all requested data was found
PietrH May 22, 2023
b0efe47
Fix error in expected message
PietrH May 22, 2023
66a49a6
use radar station that has data for selected dates
PietrH May 22, 2023
f97060f
Set to warn for missing dates instead of error
PietrH May 22, 2023
ba2740a
use if() instead of assertthat::validate_that()
PietrH May 22, 2023
60ea59a
avoid unnecessary warnings in tests
PietrH May 22, 2023
7e63f76
Add to roxygen function documentation
PietrH May 22, 2023
f0e1b4f
Stylr
PietrH May 22, 2023
4e49b5f
Remove redundant test
PietrH May 22, 2023
2e365a8
add example
PietrH May 22, 2023
327ae92
devtools::document()
PietrH May 22, 2023
d58792d
add jsonlite and purrr dependencies
PietrH May 22, 2023
723b264
Add list_vpts_aloft() to pkgdown manually
PietrH May 23, 2023
8a5f41f
add title
PietrH May 23, 2023
14e4e8b
Merge branch 'master' into download_vpts_aloft
PietrH May 23, 2023
9eb122e
Merge branch 'master' into download_vpts_aloft
PietrH May 23, 2023
d6f7e4c
Merge branch 'download_vpts_aloft' of https://github.com/adokter/bioR…
PietrH May 23, 2023
4840e00
simplify warning, reduce need for assertthat
PietrH May 23, 2023
3b26d6b
add missing comma
PietrH May 23, 2023
1c84b85
Bring documentation in line with other functions, add verbose argumen…
PietrH May 23, 2023
07a3f60
add code sections
PietrH May 23, 2023
c6645f2
tests warning for missing dates
PietrH May 23, 2023
8aee815
test verbose argument
PietrH May 23, 2023
1cb0e5a
split up tests warning for dates, radars, and both combined
PietrH May 23, 2023
ec95e60
rename `verbose` to `show_warnings`
PietrH May 23, 2023
8234c2f
only print warnings when `show_warnings = TRUE`
PietrH May 23, 2023
2038e5c
reword warning
PietrH May 23, 2023
a60850a
style changes
PietrH May 23, 2023
7704f8e
add helper to emulate `stringr::str_extract()`
PietrH May 23, 2023
8fd9deb
use `extract_string` helper: eliminate `jsonlite` Import
PietrH May 23, 2023
0ebf6ab
devtools::document()
PietrH May 23, 2023
d2c1e65
replace `purrr::map_chr` with `sapply`
PietrH May 23, 2023
6ffa3a2
add test for extract_string() helper
PietrH May 23, 2023
e90a2d2
return warning and empty vector on no data, styling
PietrH May 23, 2023
6b91bf1
remove `purrr` Import
PietrH May 23, 2023
2bc1dd2
devtools::document()
PietrH May 23, 2023
2bbb5f8
don't use `letters` in test
PietrH May 23, 2023
85746e5
Increase date_max till extra digit will be needed
PietrH May 23, 2023
4319c16
bump version number: dev referring to pull request
PietrH May 23, 2023
fd3b1b5
add code section for final `return()` statement
PietrH May 23, 2023
1fa5669
avoiding `sapply` as per rOpenSci rec: not type safe
PietrH May 23, 2023
dce67e4
add list_vpts_aloft() to NEWS.md
PietrH May 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: bioRad
Title: Biological Analysis and Visualization of Weather Radar Data
Version: 0.7.0.9581
Version: 0.7.0.9579
Description: Extract, visualize and summarize aerial movements of birds and
insects from weather radar data. See <doi:10.1111/ecog.04028>
for a software paper describing package and methodologies.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -124,6 +124,7 @@ export(is.vp)
export(is.vpfile)
export(is.vpi)
export(is.vpts)
export(list_vpts_aloft)
export(map)
export(nexrad_to_odim)
export(noy)
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -18,6 +18,7 @@ bioRad 0.7 includes a major backend overhaul that deprecates the use of Docker.
* simplify installation, including automatic installation of rhdf5 from bioconductor (#464)
* new sep argument in read_vpts() (#536)
* allow odim files with missing source attribute. Extraction of radar indentifer from what/source attribute in read_pvolfiles updated to function as read_vpfiles(), i.e. using the NOD identifier in the source attribute, if missing try RAD, if also missing try WMO, if nothing found use "unknown" (2f6935c).
* new function `list_vpts_aloft()` produces a list of aloft archive urls for time series of vertical profiles (`vpts`). This list of urls can then be used to bulk download this data using any number of external tools. #553


# bioRad 0.6.1
187 changes: 187 additions & 0 deletions R/list_vpts_aloft.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
#' List aloft urls for time series of vertical profiles (`vpts`) of radar
#' stations
#'
#' @param date_min Character, the first date to return urls for. In the shape of
#' YYYY-MM-DD.
#' @param date_max Character, the last date to return urls for. In the shape of
#' YYYY-MM-DD.
#' @param radars Character vector, radar stations to return urls for.
#' @param format Character, the format of archive urls to return, either csv or
#' hdf5. Currently only csv urls are supported.
#' @param source Character, either `baltrad` or `ecog-04003`
#' @param show_warnings Logical, whether to print warnings for dates or radar
#' stations for which no data was found.
#'
#' @return A character vector of aloft urls
#' @export
#'
#' @examples
#' list_vpts_aloft(radars = "bejab")
list_vpts_aloft <- function(date_min = NULL,
date_max = NULL,
radars = NULL,
format = "csv", # also hdf5
source = "baltrad", # also ecog-04003
show_warnings = TRUE) {
# Check if aws.s3 is installed
# NOTE added because aws.s3 is schedueled to be moved to Suggests

rlang::check_installed("aws.s3",
reason = "to connect to the aloft bucket on Amazon Web Services"
)

# check arguments against vocabulary --------------------------------------
# Check source
valid_sources <- c("baltrad", "ecog-04003")
assertthat::assert_that(
source %in% valid_sources,
msg = glue::glue(
"`source` must be one of: {valid_sources_collapse}.",
valid_sources_collapse = glue::glue_collapse(
glue::backtick(valid_sources), sep = ", "
)
)
)

# Check format
valid_formats <- c("csv", "hdf5")
assertthat::assert_that(
format %in% valid_formats,
msg = glue::glue(
"`format` must be one of: {valid_formats_collapse}.",
valid_formats_collapse = glue::glue_collapse(
glue::backtick(valid_formats), sep = ", "
)
)
)

# check radars
aloft_radars_url <-
paste(
sep = "/",
"https://raw.githubusercontent.com",
"enram",
"aloftdata.eu",
"main",
"_data",
"OPERA_RADARS_DB.json"
)
valid_radars <- readr::read_lines(aloft_radars_url) %>%
extract_string(pattern = '(?<="odimcode": ")[a-z]{5}', perl = TRUE)

assertthat::assert_that(
all(radars %in% valid_radars),
msg = glue::glue("Can't find radar(s): {missing_radars}",
missing_radars = radars[!radars %in% valid_radars]
)
)

# create file list --------------------------------------------------------
## handle dates -----------------------------------------------------------

# handle missing dates
if (rlang::is_empty(date_min)) {
# if date_min is missing, set it to a date predating any radar observations
date_min <- "1900-01-01"
}
if (rlang::is_empty(date_max)) {
date_max <- "9999-12-31"
}

# Convert to dates
start_date <- as.Date(date_min, tz = NULL)
end_date <- as.Date(date_max, tz = NULL)

## set static urls --------------------------------------------------------
# Set base URL
base_url <- "https://aloft.s3-eu-west-1.amazonaws.com"

# format csv --------------------------------------------------------------
if (format == "csv") {
# Aloft CSV data are available in daily and monthly files
# This function uses the zipped monthly files, which are faster to download
months <- format(seq(start_date, end_date, by = "months"), "%Y%m")

found_vpts_aloft <-
aws.s3::get_bucket_df(
bucket = "s3://aloft",
prefix = glue::glue("{source}/monthly"),
region = "eu-west-1",
max = Inf
) %>%
dplyr::mutate(
radar = vapply(.data$Key, FUN = function(radar_key) {
strsplit(radar_key, "/", fixed = TRUE)[[1]][3]
}, FUN.VALUE = character(1)),
date = extract_string(.data$Key, "[0-9]{6}")
) %>%
dplyr::filter(
.data$radar %in% radars,
date %in% months
)

# format hdf5 -------------------------------------------------------------
} else {
# hdf5 files
# TODO: create file paths of form
# https://aloft.s3-eu-west-1.amazonaws.com/baltrad/hdf5/bejab/2023/05/02/bejab_vp_20230502T000000Z_0x9.h5
}

# format found data -------------------------------------------------------
found_radars <-
dplyr::distinct(found_vpts_aloft, .data$radar) %>%
dplyr::pull("radar")

data_urls <-
glue::glue("{base_url}/{keys}",
keys = dplyr::pull(found_vpts_aloft, "Key"),
base_url = base_url
)

# warnings ----------------------------------------------------------------
## warn if no data found --------------------------------------------------
if (rlang::is_empty(data_urls) && show_warnings) {
warning(
glue::glue("No data found for radars between {date_min} - {date_max}")
)
# stop here, no need to warn for radars and dates individually
return(data_urls)
}
## warn missing radar stations --------------------------------------------
# Provide a warning if data couldn't be retrieved for all requested radar
# stations

all_radars_found <- all(found_radars == radars)
if (!all_radars_found && show_warnings) {
warning(
glue::glue(
"Found no data for radars: {missing_radars_collapse}",
missing_radars_collapse =
glue::glue_collapse(
glue::backtick(radars[!radars %in% found_radars]),
sep = ", "
)
)
)
}

## warn missing dates -----------------------------------------------------
# Warn if less dates were found then requested
if (!all(months %in% found_vpts_aloft$date) && show_warnings) {
warning(
glue::glue(
"Not every date has radar data, ",
"radars found for {first_date_found} to {last_date_found}",
first_date_found = format(lubridate::ym(min(
found_vpts_aloft$date
)), "%Y-%m"),
last_date_found = format(lubridate::ym(max(
found_vpts_aloft$date
)), "%Y-%m")
)
)
}

# output vector of urls ---------------------------------------------------
return(data_urls)
}
19 changes: 19 additions & 0 deletions R/zzz.R
Original file line number Diff line number Diff line change
@@ -27,3 +27,22 @@ skip_if_no_mistnet <- function(){
}
testthat::skip("No MistNet")
}


#' extract strings from a vector using regex, analog to stringr::str_extract
#'
#' @param string Input vector. A character vector.
#' @param pattern Regex pattern to look for
#' @param ... passed on to `regexpr()`
#'
#' @return A character vector with matches only, possibly of different length as
#' `string`
#' @keywords internal
extract_string <- function(string,pattern,...) {
regmatches(string,
m = regexpr(
pattern = pattern,
text = string,
...
))
}
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
@@ -82,6 +82,7 @@ reference:
- example_vp
- plot.vp
- as.data.frame.vp
- list_vpts_aloft
- title: "Combining vertical profiles into time series"
desc: "Functions to combine vertical profiles (vp) into time series (vpts) and read, inspect and plot these."
contents:
23 changes: 23 additions & 0 deletions man/extract_string.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions man/list_vpts_aloft.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

79 changes: 66 additions & 13 deletions tests/testthat/test-download_vpfiles.R
Original file line number Diff line number Diff line change
@@ -1,27 +1,80 @@
test_that("download_vpfiles() returns error on incorrect parameters", {

# Define default for testing
date_min <- "2016-10-01"
date_max <- "2016-11-30"
radars <- c("bejab", "bewid")
directory <- tempdir()
overwrite <- TRUE

expect_error(download_vpfiles('01/01/2016', date_max, radars, directory, overwrite), "Incorrect date format: 01/01/2016", fixed = TRUE)
expect_error(download_vpfiles(12345, date_max, radars, directory, overwrite), "date_min is not a string (a length one character vector).", fixed = TRUE)
expect_error(download_vpfiles(c(date_min,date_max), date_max, radars, directory, overwrite), "date_min is not a string (a length one character vector).", fixed = TRUE)
expect_error(
download_vpfiles("01/01/2016", date_max, radars, directory, overwrite),
"Incorrect date format: 01/01/2016",
fixed = TRUE
)
expect_error(
download_vpfiles(12345, date_max, radars, directory, overwrite),
"date_min is not a string (a length one character vector).",
fixed = TRUE
)
expect_error(
download_vpfiles(c(date_min, date_max), date_max, radars, directory, overwrite),
"date_min is not a string (a length one character vector).",
fixed = TRUE
)

expect_error(download_vpfiles(date_min, '01/01/2016', radars, directory, overwrite), glue("Incorrect date format: ", '01/01/2016'), fixed = TRUE)
expect_error(download_vpfiles(date_min ,12345, radars, directory, overwrite), "date_max is not a string (a length one character vector).", fixed = TRUE)
expect_error(download_vpfiles(date_min, c(date_min,date_max), radars, directory, overwrite), "date_max is not a string (a length one character vector).", fixed = TRUE)
expect_error(
download_vpfiles(date_min, "01/01/2016", radars, directory, overwrite),
glue("Incorrect date format: ", "01/01/2016"),
fixed = TRUE
)
expect_error(
download_vpfiles(date_min, 12345, radars, directory, overwrite),
"date_max is not a string (a length one character vector).",
fixed = TRUE
)
expect_error(
download_vpfiles(date_min, c(date_min, date_max), radars, directory, overwrite),
"date_max is not a string (a length one character vector).",
fixed = TRUE
)

expect_error(download_vpfiles(date_min, date_max, 'not_radar_code', directory, overwrite), "Radar codes should be 5 characters: not_radar_code", fixed = TRUE)
expect_error(download_vpfiles(date_min, date_max, c("not_radar_code", "begwid"), directory, overwrite), "Radar codes should be 5 characters: not_radar_code", fixed = TRUE)
expect_error(download_vpfiles(date_min, date_max, "abcde", directory, overwrite), "Radar codes don't exist: abcde", fixed = TRUE)
expect_error(
download_vpfiles(date_min, date_max, "not_radar_code", directory, overwrite),
"Radar codes should be 5 characters: not_radar_code",
fixed = TRUE
)
expect_error(
download_vpfiles(
date_min,
date_max,
c("not_radar_code", "begwid"),
directory,
overwrite
),
"Radar codes should be 5 characters: not_radar_code",
fixed = TRUE
)
expect_error(
download_vpfiles(date_min, date_max, "abcde", directory, overwrite),
"Radar codes don't exist: abcde",
fixed = TRUE
)


expect_error(download_vpfiles(date_min, date_max, radars, 1, overwrite), "path is not a string (a length one character vector)", fixed = TRUE)
expect_error(download_vpfiles(date_min, date_max, radars, 'not_a_directory', overwrite), "Path 'not_a_directory' does not exist", fixed = TRUE)
expect_error(
download_vpfiles(date_min, date_max, radars, 1, overwrite),
"path is not a string (a length one character vector)",
fixed = TRUE
)
expect_error(
download_vpfiles(date_min, date_max, radars, "not_a_directory", overwrite),
"Path 'not_a_directory' does not exist",
fixed = TRUE
)

expect_error(download_vpfiles(date_min, date_max, radars, directory, 'not_a_logical'), "overwrite is not a logical", fixed = TRUE)
expect_error(
download_vpfiles(date_min, date_max, radars, directory, "not_a_logical"),
"overwrite is not a logical",
fixed = TRUE
)
})
184 changes: 184 additions & 0 deletions tests/testthat/test-list_vpts_aloft.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
test_that("list_vpts_aloft() returns error for unknown source", {
expect_error(
list_vpts_aloft(
date_min = "2000-01-01",
date_max = "2001-12-15",
radars = "itdec",
source = "not a valid source"
),
regexp = "`source` must be one of: `baltrad`, `ecog-04003`.",
fixed = TRUE
)
})

test_that("list_vpts_aloft() returns error for invalid format", {
expect_error(
list_vpts_aloft(
date_min = "2000-01-01",
date_max = "2001-12-15",
radars = "itdec",
format = "not a valid format"
),
regexp = "`format` must be one of: `csv`, `hdf5`.",
fixed = TRUE
)
})

test_that("list_vpts_aloft() returns error if radar doesn't exist", {
expect_error(
list_vpts_aloft(
date_min = "1990-01-01",
date_max = "2050-01-01",
radars = c("not a valid radar")
),
regexp = "Can't find radar(s): not a valid radar",
fixed = TRUE
)
})

test_that("list_vpts_aloft() returns a character vector", {
expect_type(
list_vpts_aloft(
date_min = "2023-02-01",
date_max = "2023-05-01",
radars = c("bejab", "bewid")
),
"character"
)
})

test_that("list_vpts_aloft() returns no warning when all dates are specified", {
expect_no_warning(
list_vpts_aloft(
radars = "bejab",
date_min = "2023-02-01",
date_max = "2023-04-01"
)
)
})

test_that("list_vpts_aloft() works without specifying dates", {
# just date_min
expect_no_error(
suppressWarnings(list_vpts_aloft(
date_min = "1900-01-01",
radars = "frmtc"
))
)
# just date_max
expect_no_error(
suppressWarnings(list_vpts_aloft(
date_max = Sys.Date(),
radars = "bejab"
))
)
# neither provided
expect_no_error(
suppressWarnings(list_vpts_aloft(
radars = "essse"
))
)
})

test_that("list_vpts_aloft() returns all data when no dates are provided", {
expect_gt(
length(
suppressWarnings(list_vpts_aloft(
radars = "bejab"
))
),
length(
list_vpts_aloft(
radars = "bejab",
date_min = "2023-02-01",
date_max = "2023-04-01"
)
)
)
})

test_that("list_vpts_aloft() warns if data was found subset of radars", {
expect_warning(
list_vpts_aloft(
date_min = "2023-02-01",
date_max = "2023-05-22",
radars = c("nobml", "plpas")
),
regexp = "Found no data for radars: `plpas`",
fixed = TRUE
)
})

test_that("list_vpts_aloft() warns if not all dates were found", {
expect_warning(
list_vpts_aloft(
date_min = "1900-01-01",
date_max = "2023-05-22",
radars = "nobml"
),
regexp = paste("Not every date has radar data,",
"radars found for 2023-02 to 2023-05"),
fixed = TRUE
)
})

test_that("list_vpts_aloft() can warn for both missing radars and dates", {
expect_warning(
list_vpts_aloft(
date_min = "1900-01-01",
date_max = "2023-05-22",
radars = c("nobml", "plpas")
),
regexp = "Found no data for radars: `plpas`",
fixed = TRUE
)
expect_warning(
list_vpts_aloft(
date_min = "1900-01-01",
date_max = "2023-05-22",
radars = c("nobml", "plpas")
),
regexp = paste("Not every date has radar data,",
"radars found for 2023-02 to 2023-05"),
fixed = TRUE
)
})

test_that("list_vpts_aloft() warns and returns emtpy vector on no data found",{
expect_equal(
list_vpts_aloft(
date_min = "1800-01-01",
date_max = "1800-02-01",
radars = "rssje",
show_warnings = FALSE
),
glue::glue()
)
expect_warning(
list_vpts_aloft(
date_min = "1800-01-01",
date_max = "1800-02-01",
radars = "rssje",
show_warnings = TRUE
)
)
})

test_that("list_vpts_aloft() silences warnings with show_warnings argument", {
expect_no_warning(
list_vpts_aloft(
date_min = "1900-01-01",
date_max = "2023-05-22",
radars = c("nobml", "plpas"),
show_warnings = FALSE
)
)
expect_no_warning(
list_vpts_aloft(
date_min = "1900-01-01",
date_max = "2023-05-22",
radars = "nobml",
show_warnings = FALSE
)
)
})
10 changes: 10 additions & 0 deletions tests/testthat/test-zzz.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
test_that("extract_string() can extract a string from a vector", {
expect_identical(
extract_string(
"These are the voyages of the starship Enterprise",
"[a-z]{8}(?= Enterprise)",
perl = TRUE
),
"starship"
)
})