Skip to content
This repository has been archived by the owner on Mar 13, 2020. It is now read-only.

Releases: pageuppeople-opensource/data-pipeline-orchestrator

Add ability to accept initial execution id

11 Oct 03:34
6995e52
Compare
Choose a tag to compare

The init-execution command now accepts an optional --execution-id parameter where users can provide a GUID themselves.

Start logging stats upon completion of execution and its steps

15 May 01:45
3e1a7b2
Compare
Choose a tag to compare
v0.1.5-beta

Merge pull request #27 from PageUpPeopleOrg/feature/log-stats-upon-co…

Improve integration tests

14 May 06:16
Compare
Choose a tag to compare
Pre-release

Merges

  • Run tests using a restricted test user instead of admin (#26)

Add ability to track execution steps and statistics

13 May 03:03
2a6f430
Compare
Choose a tag to compare

Add ability to track execution steps and statistics

  • introduce a new entity execution_step to track each of the data pipeline execution's steps like LOAD, TRANSFORM, etc.
  • update type of execution_time_ms to store up to PostgreSQL 'BIGINT' type
  • update integration tests to cover all changes

Rename execution tracking tables

15 Apr 02:42
1755cd2
Compare
Choose a tag to compare
Pre-release

Merges [OSC-1302] Rename tables to match rdl (#22)

Rename to DPO, Integration tests and Alembic

08 Apr 03:59
69deebc
Compare
Choose a tag to compare

Changes

Add Alembic

  • #20 : add schema revision tool alembic
  • #21 : delete accidental alembic file

Rename MCD to DPO

  • formerly "model-change-detector" now "data-pipeline-orchestrator"

Add Integrations Tests

Add coverage for all new commands in integration tests

  • add coverage for all new commands in integration tests
  • make integration tests re-runnable
  • apply DRY to assist multiple execution iterations
  • log passing tests post assertion
  • move from plain Bourne shell to Bash shell since we now use a 3rd-party gist to generate UUIDs to allow us to re-run integration tests on dev machines

Refactor commands to support better state management

04 Mar 23:28
6b16f43
Compare
Choose a tag to compare
  • renamed the below commands
    • init command to init-execution
    • complete command to complete-execution
      • this now also calculates the overall execution time between init-execution and complete-execution
  • added the below commands:
    • get-last-successful-execution: Finds the last successful data pipeline execution. Returns an execution-id which is a GUID identifier of the new execution, if found; else returns and empty string.
    • get-execution-last-updated-timestamp: Returns the last-updated-on ISO 8601 datetime with timezone of the given execution-id. Raises an error if given execution-id is invalid.
  • split compare command into:
    • persist-models: Saves models of the given model-type within the given execution-id by persisting hashed checksums of the given models.
    • compare-models: Compares the hashed checksums of models between two executions. Returns comma-separated string of changed model names.
      • this now returns all models when all models have changed OR during first execution instead of the previous *

Add new commands - 'compare' and 'complete' data pipeline execution

24 Jan 01:10
5f1ee98
Compare
Choose a tag to compare

New commands:

  • compare: Compares & persists SHA256-hashed checksums of the given models against those of the last successful execution. Returns comma-separated string of changed model names. Parameters required:
    • execution-id: a GUID identifier of an existing data pipeline execution as returned by the init command.
    • model-type: type of models being processed e.g.: load, transform, etc. this model-type is used to group the model checksums by and used to find and compare older ones.
    • base-path: absolute or relative path to the models e.g.: ./load, /home/local/load, C:/path/to/load
    • model-patterns: path-based patterns (relative to base-path) to different models with extensions. models within a model-type must be named uniquely regardless of their file extension. e.g.: *.txt, **/*.txt, ./relative/path/to/some_models/**/*.csv, relative/path/to/some/more/related/models/**/*.sql
  • complete: Marks the completion of an existing execution by updating a record for the same in the given database. Returns nothing unless there's an error. Parameter required:
    • execution-id: a GUID identifier of an existing data pipeline execution as returned by the init command.

Support to start a new data pipeline execution

14 Sep 22:59
4cf3072
Compare
Choose a tag to compare
v0.0.1-alpha