Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a data structure for Auditor 2 "voucher" data, metadata, and media files to be ingested #68

Open
3 tasks
rudokemper opened this issue Jan 23, 2025 · 0 comments
Assignees
Labels
feature New specs for new behavior

Comments

@rudokemper
Copy link
Member

rudokemper commented Jan 23, 2025

Feature Request

We need a define a data structure that will feed into an ETL pipeline for storing Auditor 2 "voucher" data, associated metadata, and media files in a data lake.

For this issue, let's focus on the structure of the source data only; the actual pipeline scripting will be handled at a later stage.

We can close the issue when we have schematized a mapping of incoming source columns (from multiple CSVs) to database columns, and file storage paths.

Implementation

  • @abfleishman to create table with “human readable" labels
  • @abfleishman to finalize CSV tables (with human readable labels added)
  • @rudokemper to create schema mapping incoming source columns to database columns, and file paths
@rudokemper rudokemper added the feature New specs for new behavior label Jan 23, 2025
@rudokemper rudokemper added this to the 11th Hour Project milestone Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New specs for new behavior
Projects
None yet
Development

No branches or pull requests

2 participants