-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.qmd
25 lines (20 loc) · 954 Bytes
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
title: data dictionary
format: gfm
engine: jupyter
execute:
echo: false
---
First attempt at a basic data dictionary for indicators used at DataHaven. Edits are done in Airtable, then loaded into a duckdb database (`gloss.duckdb`). This database is also made available in tagged releases in this repo and on the Motherduck platform (contact Camille if you want access). The release should facilitate building an R package to query sets of definitions easily, e.g. all indicators used in a specific project, or all indicators associated with a certain source.
The tables in the database are:
```{python}
import duckdb
con = duckdb.connect('gloss.duckdb', read_only=True)
con.sql('SELECT table_name, estimated_size AS rows, column_count AS columns FROM duckdb_tables() ORDER BY table_name;').show()
con.close()
```
Workflows are managed with snakemake, as shown:
```{python}
!snakemake --filegraph | dot -T png > dag.png
```
![dag](./dag.png)