-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.qmd
53 lines (42 loc) · 1.76 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
title: Data for neighborhood profiles--2022 ACS, 2023 PLACES
engine: knitr
execute:
echo: false
format: gfm
---
# README
Sources for the profile data are the most recent ACS (2022), CDC PLACES (2023 release), and USALEEP (not updated). There's no fresh analysis done in this repo---this is a way to assemble data from other projects with readable headings and descriptions, and prep it for each city's neighborhood profiles and online visualization.
Datasets prepped for download from other repos are in their respective tagged releases to ensure their stability and reproduceability. Not all assets from each tag are used, but the files in those releases are:
```{r}
#| message: false
repos <- list("2022acs" = "dist",
"cdc_aggs" = "v2023",
"scratchpad" = "meta") |>
tibble::enframe(name = "repo", value = "tag") |>
tidyr::unnest(tag)
repos |>
purrr::pmap(function(repo, tag) {
q <- stringr::str_glue("gh release view {tag} --repo CT-Data-Haven/{repo} --json tagName,assets,url")
system(q, intern = TRUE)
}) |>
purrr::map(jsonlite::fromJSON) |>
purrr::map(dplyr::as_tibble) |>
purrr::map(tidyr::unnest, assets, names_sep = "_") |>
purrr::map(dplyr::select, tag = tagName, assets_name, url, updated = assets_updatedAt) |>
purrr::map(dplyr::mutate, repo = stringr::str_extract(url, "(?<=CT\\-Data\\-Haven\\/)(\\w+)(?=\\/)")) |>
dplyr::bind_rows() |>
dplyr::mutate(tag = stringr::str_glue("[{tag}]({url})")) |>
dplyr::group_by(repo, tag, updated) |>
dplyr::summarise(assets = toString(assets_name)) |>
knitr::kable()
```
This uses snakemake to build. Rules available are:
```{bash}
snakemake --list-rules
```
Build process is as follows:
```{bash}
snakemake --filegraph | dot -T png > dag.png
```
![snakemake DAG](dag.png)