Skip to content

Latest commit

 

History

History
28 lines (21 loc) · 1.18 KB

README.md

File metadata and controls

28 lines (21 loc) · 1.18 KB

data dictionary

First attempt at a basic data dictionary for indicators used at DataHaven. Edits are done in Airtable, then loaded into a duckdb database (gloss.duckdb). This database is also made available in tagged releases in this repo and on the Motherduck platform (contact Camille if you want access). The release should facilitate building an R package to query sets of definitions easily, e.g. all indicators used in a specific project, or all indicators associated with a certain source.

The tables in the database are:

┌────────────┬───────┬─────────┐
│ table_name │ rows  │ columns │
│  varchar   │ int64 │  int64  │
├────────────┼───────┼─────────┤
│ projects   │     3 │       3 │
│ sources    │    12 │       7 │
│ variables  │   118 │      10 │
│ vocab      │     1 │       6 │
└────────────┴───────┴─────────┘

Workflows are managed with snakemake, as shown:

Building DAG of jobs...

dag