From a2a44ba4771fc7e6f430ee5f922a08dbbdb64b6d Mon Sep 17 00:00:00 2001 From: David Gasquez Date: Wed, 14 Dec 2022 10:21:25 +0100 Subject: [PATCH] :memo: --- Data/Data Engineering.md | 6 +++--- Idempotence.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/Data/Data Engineering.md b/Data/Data Engineering.md index a771ae0..6f2149b 100644 --- a/Data/Data Engineering.md +++ b/Data/Data Engineering.md @@ -16,12 +16,12 @@ Systems tend towards production and data pipelines aren't an exception. Valuable ### Basic Principles -- **Simplicity**: Each steps is easy to understand and modify. +- **Simplicity**: Each steps is easy to understand and modify. Rely on immutable data. Write only. No deletes. No updates. - **Reliability**: Errors in the pipelines can be recovered. Pipelines are monitored and tested. Data is saved in each step (storage is cheap) so it can be used later if needed. For example, adding a new column to a table can be done extracting the column from the intermediary data without having to query the data source. It is better to support 1 feature that works reliably and has a great UX than 2 that are unreliable or hard to use. One solid step is better than 2 finicky ones. - **[[Modularity]]**: Steps are independent, declarative, and [[Idempotence | itempotent]]. -- **Consistency**: Same conventions and design patterns across pipelines. If a failure is actionable by the user, clearly let them know what they can do. +- **Consistency**: Same conventions and design patterns across pipelines. If a failure is actionable by the user, clearly let them know what they can do. Schema on write. - **Efficiency**: Low event latency when needed. Easy to scale up and down. A user should not be able to configure something that will not work. -- **Flexibility**: Steps change to conform data points. Changes don't stop the pipeline or losses data. +- **Flexibility**: Steps change to conform data points. Changes don't stop the pipeline or losses data. Fail fast and upstream. ### Data Flow diff --git a/Idempotence.md b/Idempotence.md index 8cab01f..8ec9d84 100644 --- a/Idempotence.md +++ b/Idempotence.md @@ -1,3 +1,3 @@ # Idempotence -[Idempotence](https://en.wikipedia.org/wiki/Idempotence) is the property of a process that when run 1 or more times, it only has the effect of being run once. +[Idempotence](https://en.wikipedia.org/wiki/Idempotence) is the property of a process that when run 1 or more times, it only has the effect of being run once. An idempotent function is deterministic and repeatable.