Skip to content

Commit

Permalink
* fixed suggested README.md inaccuracy
Browse files Browse the repository at this point in the history
* better `build.sbt`
  • Loading branch information
benedeki committed Nov 18, 2024
1 parent 4a01e87 commit cfa6964
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 7 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,11 @@ the BCBS set of regulations requires analysis and reporting to be based on data
Thus it is critical at the ingestion stage to preserve the accuracy and integrity of the data gathered from a
source system.

The purpose of Atum is to provide means of ensuring no critical fields have been modified during
the processing and no records are added or lost. To do this the library provides an ability
to calculate *hash sums* of explicitly specified columns. We call the set of hash sums at a given time
a *checkpoint* and each hash sum we call a *control measurement*. Checkpoints can be calculated anytime
between Spark transformations and actions.
The purpose of Atum is to provide means of ensuring no critical fields have been modified during the processing and no
records are added or lost. To do this the library provides an ability to calculate *control numbers* of explicitly
specified columns using a selection of agregate function. We call the set of such measurements at a given time
a *checkpoint* and each value - a result of the function computation - we call a *control measurement*. Checkpoints can
be calculated anytime between Spark transformations and actions, so as at the start of the process or after its end.

We assume the data for ETL are processed in a series of batch jobs. Let's call each data set for a given batch
job a *batch*. All checkpoints are calculated for a specific batch.
Expand Down
2 changes: 0 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ import Dependencies.*
import Dependencies.Versions.spark3
import VersionAxes.*

ThisBuild / scalaVersion := Setup.scala213.asString // default version TODO

ThisBuild / versionScheme := Some("early-semver")

Global / onChangedBuildSource := ReloadOnSourceChanges
Expand Down

0 comments on commit cfa6964

Please sign in to comment.