This repository has been archived by the owner on Mar 13, 2020. It is now read-only.
Releases: pageuppeople-opensource/data-pipeline-orchestrator
Releases · pageuppeople-opensource/data-pipeline-orchestrator
Add ability to accept initial execution id
The init-execution
command now accepts an optional --execution-id
parameter where users can provide a GUID themselves.
Start logging stats upon completion of execution and its steps
v0.1.5-beta Merge pull request #27 from PageUpPeopleOrg/feature/log-stats-upon-co…
Improve integration tests
Merges
- Run tests using a restricted test user instead of admin (#26)
Add ability to track execution steps and statistics
Add ability to track execution steps and statistics
- introduce a new entity
execution_step
to track each of the data pipeline execution's steps like LOAD, TRANSFORM, etc. - update type of
execution_time_ms
to store up to PostgreSQL 'BIGINT' type - update integration tests to cover all changes
Rename execution tracking tables
Merges [OSC-1302] Rename tables to match rdl (#22)
Rename to DPO, Integration tests and Alembic
Changes
Add Alembic
Rename MCD to DPO
- formerly "model-change-detector" now "data-pipeline-orchestrator"
Add Integrations Tests
Add coverage for all new commands in integration tests
- add coverage for all new commands in integration tests
- make integration tests re-runnable
- apply DRY to assist multiple execution iterations
- log passing tests post assertion
- move from plain Bourne shell to Bash shell since we now use a 3rd-party gist to generate UUIDs to allow us to re-run integration tests on dev machines
Refactor commands to support better state management
- renamed the below commands
init
command toinit-execution
complete
command tocomplete-execution
- this now also calculates the overall execution time between
init-execution
andcomplete-execution
- this now also calculates the overall execution time between
- added the below commands:
get-last-successful-execution
: Finds the last successful data pipeline execution. Returns anexecution-id
which is a GUID identifier of the new execution, if found; else returns and empty string.get-execution-last-updated-timestamp
: Returns thelast-updated-on
ISO 8601 datetime with timezone of the givenexecution-id
. Raises an error if givenexecution-id
is invalid.
- split
compare
command into:persist-models
: Saves models of the givenmodel-type
within the givenexecution-id
by persisting hashed checksums of the given models.compare-models
: Compares the hashed checksums of models between two executions. Returns comma-separated string of changed model names.- this now returns all models when all models have changed OR during first execution instead of the previous
*
- this now returns all models when all models have changed OR during first execution instead of the previous
Add new commands - 'compare' and 'complete' data pipeline execution
New commands:
compare
: Compares & persists SHA256-hashed checksums of the given models against those of the last successful execution. Returns comma-separated string of changed model names. Parameters required:execution-id
: a GUID identifier of an existing data pipeline execution as returned by theinit
command.model-type
: type of models being processed e.g.:load
,transform
, etc. thismodel-type
is used to group the model checksums by and used to find and compare older ones.base-path
: absolute or relative path to the models e.g.:./load
,/home/local/load
,C:/path/to/load
model-patterns
: path-based patterns (relative tobase-path
) to different models with extensions. models within a model-type must be named uniquely regardless of their file extension. e.g.:*.txt
,**/*.txt
,./relative/path/to/some_models/**/*.csv
,relative/path/to/some/more/related/models/**/*.sql
complete
: Marks the completion of an existing execution by updating a record for the same in the given database. Returns nothing unless there's an error. Parameter required:execution-id
: a GUID identifier of an existing data pipeline execution as returned by theinit
command.
Support to start a new data pipeline execution
v0.0.1-alpha