Skip to content

Commit

Permalink
📝
Browse files Browse the repository at this point in the history
  • Loading branch information
davidgasquez committed Dec 14, 2022
1 parent db225ad commit a2a44ba
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
6 changes: 3 additions & 3 deletions Data/Data Engineering.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ Systems tend towards production and data pipelines aren't an exception. Valuable

### Basic Principles

- **Simplicity**: Each steps is easy to understand and modify.
- **Simplicity**: Each steps is easy to understand and modify. Rely on immutable data. Write only. No deletes. No updates.
- **Reliability**: Errors in the pipelines can be recovered. Pipelines are monitored and tested. Data is saved in each step (storage is cheap) so it can be used later if needed. For example, adding a new column to a table can be done extracting the column from the intermediary data without having to query the data source. It is better to support 1 feature that works reliably and has a great UX than 2 that are unreliable or hard to use. One solid step is better than 2 finicky ones.
- **[[Modularity]]**: Steps are independent, declarative, and [[Idempotence | itempotent]].
- **Consistency**: Same conventions and design patterns across pipelines. If a failure is actionable by the user, clearly let them know what they can do.
- **Consistency**: Same conventions and design patterns across pipelines. If a failure is actionable by the user, clearly let them know what they can do. Schema on write.
- **Efficiency**: Low event latency when needed. Easy to scale up and down. A user should not be able to configure something that will not work.
- **Flexibility**: Steps change to conform data points. Changes don't stop the pipeline or losses data.
- **Flexibility**: Steps change to conform data points. Changes don't stop the pipeline or losses data. Fail fast and upstream.

### Data Flow

Expand Down
2 changes: 1 addition & 1 deletion Idempotence.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Idempotence

[Idempotence](https://en.wikipedia.org/wiki/Idempotence) is the property of a process that when run 1 or more times, it only has the effect of being run once.
[Idempotence](https://en.wikipedia.org/wiki/Idempotence) is the property of a process that when run 1 or more times, it only has the effect of being run once. An idempotent function is deterministic and repeatable.

0 comments on commit a2a44ba

Please sign in to comment.