Skip to content

Commit d440d26

Browse files
committed
Configure additional daily partitioned default checks (nulls, distinct, anomalies, data type changes).
1 parent 27641ef commit d440d26

File tree

2 files changed

+182
-16
lines changed

2 files changed

+182
-16
lines changed

docs/dqo-concepts/data-observability.md

+119
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ The default data quality checks are automatically activated on all tables and co
4848
but they can be disabled or reconfigured in the DQOps table configuration files [*.dqotable.yaml*](../reference/yaml/TableYaml.md)
4949
as described in the guide for [configuring data quality checks](configuring-data-quality-checks-and-rules.md).
5050

51+
These default configurations are also called **data quality policies**.
52+
5153
### Automatic activation of checks
5254
The [data quality check editor](dqops-user-interface-overview.md#check-editor) in DQOps
5355
shows automatically activated data quality checks as enabled but using a gray color.
@@ -237,6 +239,65 @@ The target column parameters are listed in the following table.
237239
| `data_type_category` | The category of the data type detected by DQOps. DQOps detects a database independent category of the data type. |
238240

239241

242+
### Targeting multiple data assets
243+
All filters support targeting multiple objects, except the *data_type_category* parameter, which uses well-known values from an enumeration.
244+
Targeting multiple data assets, such as multiple connections, schemas, tables, columns, labels, or data types,
245+
is supported by providing all the target data names separated by a comma.
246+
247+
The following example shows how to target multiple tables.
248+
249+
``` { .yaml linenums="1" .annotate hl_lines="7" }
250+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableDefaultChecksPatternYaml-schema.json
251+
apiVersion: dqo/v1
252+
kind: default_table_checks
253+
spec:
254+
priority: 1000
255+
target:
256+
table: "fact_sales,dim_pro*"
257+
```
258+
259+
The following example shows how to target multiple columns.
260+
261+
``` { .yaml linenums="1" .annotate hl_lines="7" }
262+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnDefaultChecksPatternYaml-schema.json
263+
apiVersion: dqo/v1
264+
kind: default_column_checks
265+
spec:
266+
priority: 1000
267+
target:
268+
column: "customer_id,product_id"
269+
```
270+
271+
## Deactivating the policy
272+
The default configurations of data quality checks (policies) can be deactivated. DQOps does not apply the disabled policies.
273+
Each default checks configuration file has a *disabled* boolean flag. The following examples show how to turn off a policy.
274+
275+
The following example shows how to disable a table-level policy.
276+
277+
``` { .yaml linenums="1" .annotate hl_lines="6" }
278+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableDefaultChecksPatternYaml-schema.json
279+
apiVersion: dqo/v1
280+
kind: default_table_checks
281+
spec:
282+
priority: 1000
283+
disabled: true
284+
target:
285+
table: "fact_sales,dim_pro*"
286+
```
287+
288+
The following example shows how to disable a column-level policy.
289+
290+
``` { .yaml linenums="1" .annotate hl_lines="6" }
291+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnDefaultChecksPatternYaml-schema.json
292+
apiVersion: dqo/v1
293+
kind: default_column_checks
294+
spec:
295+
priority: 1000
296+
disabled: true
297+
target:
298+
column: "customer_id,product_id"
299+
```
300+
240301
## Configuring check patterns in UI
241302
The configuration of the default data quality check patterns in DQOps is found in the *Default checks configuration* node of the *Configuration* section.
242303

@@ -524,6 +585,64 @@ spec:
524585
```
525586

526587

588+
### Default daily partitioned checks
589+
The default configuration of column-level [partition checks](definition-of-data-quality-checks/partition-checks.md)
590+
focuses on detecting anomalies related to null values, numeric values and distinct values across daily partitions.
591+
592+
The default column-level daily partition checks are described in the table below.
593+
594+
| Category | Data quality check | Description | Data quality rule |
595+
|-----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
596+
| [nulls](../categories-of-data-quality-checks/how-to-detect-empty-or-incomplete-columns-with-nulls.md) | <span class="no-wrap-code ">[`daily_partition_nulls_count`](../checks/column/nulls/nulls-count.md#daily-partition-nulls-count)</span> | Counts null values in a monitored column. Detects partially incomplete columns that contain any null values. | _no rules (use the dashboards to review the results)_ |
597+
| [nulls](../categories-of-data-quality-checks/how-to-detect-empty-or-incomplete-columns-with-nulls.md) | <span class="no-wrap-code ">[`daily_partition_nulls_percent`](../checks/column/nulls/nulls-percent.md#daily-nulls-percent)</span> | Measures the percentage of null values in a column. | _no rules (use the dashboards to review the results)_ |
598+
| [nulls](../categories-of-data-quality-checks/how-to-detect-empty-or-incomplete-columns-with-nulls.md) | <span class="no-wrap-code ">[`daily_partition_nulls_percent_anomaly`](../checks/column/nulls/nulls-percent-anomaly.md#daily-nulls-percent-anomaly)</span> | Detects anomalies in the percentage of null values. Identifies the most significant increases or decreases in the rate of null values since the previous day or the last known value. | Raises a *warning* severity issue when the increase or decrease in the percentage of nulls is in the top 1% of the biggest day-to-day changes. |
599+
| [nulls](../categories-of-data-quality-checks/how-to-detect-empty-or-incomplete-columns-with-nulls.md) | <span class="no-wrap-code ">[`daily_partition_not_nulls_percent`](../checks/column/nulls/not-nulls-percent.md#daily-not-nulls-percent)</span> | Detects empty columns by counting not null values. | _no rules (use the dashboards to review the results)_ |
600+
| [uniqueness](../categories-of-data-quality-checks/how-to-detect-data-uniqueness-issues-and-duplicates.md) | <span class="no-wrap-code ">[`daily_partition_distinct_count_anomaly`](../checks/column/uniqueness/distinct-count-anomaly.md#daily-partition-distinct-count-anomaly)</span> | Detects anomalies in the count of distinct (unique) values. Identifies the most significant increases or decreases in the count of distinct values since the previous day or the last known value. | Raises a *warning* severity issue when the increase or decrease in the count of distinct values is in the top 1% of the most significant day-to-day changes. |
601+
| [anomaly](../categories-of-data-quality-checks/how-to-detect-anomaly-data-quality-issues.md) | <span class="no-wrap-code ">[`daily_partition_sum_anomaly`](../checks/column/anomaly/sum-anomaly.md#daily-partition-sum-anomaly)</span> | Detects anomalies in the sum of numeric values. Identifies the most significant increases or decreases in the sum of values since the previous day or the last known value. **_DQOps activates this check only on numeric columns._** | Raises a *warning* severity issue when the increase or decrease in the sum of numeric values is in the top 1% of the most significant day-to-day changes. |
602+
| [anomaly](../categories-of-data-quality-checks/how-to-detect-anomaly-data-quality-issues.md) | <span class="no-wrap-code ">[`daily_partition_mean_anomaly`](../checks/column/anomaly/mean-anomaly.md#daily-partition-mean-anomaly)</span> | Detects anomalies in the mean (average) of numeric values. Identifies the most significant increases or decreases in the mean of values since the previous day or the last known value. **_DQOps activates this check only on numeric columns._** | Raises a *warning* severity issue when the increase or decrease in the mean of numeric values is in the top 1% of the most significant day-to-day changes. |
603+
| [anomaly](../categories-of-data-quality-checks/how-to-detect-anomaly-data-quality-issues.md) | <span class="no-wrap-code ">[`daily_partition_min_anomaly`](../checks/column/anomaly/min-anomaly.md#daily-partition-min-anomaly)</span> | Detects anomalies as a new minimal numeric value (outlier detection). Identifies the most significant increases or decreases in the minimal value since the previous day or the last known value. **_DQOps activates this check only on numeric columns._** | Raises a *warning* severity issue when the increase or decrease in the minimum of numeric values is in the top 1% of the most significant day-to-day changes. |
604+
| [anomaly](../categories-of-data-quality-checks/how-to-detect-anomaly-data-quality-issues.md) | <span class="no-wrap-code ">[`daily_partition_max_anomaly`](../checks/column/anomaly/max-anomaly.md#daily-partition-max-anomaly)</span> | Detects anomalies as a new maximal numeric value (outlier detection). Identifies the most significant increases or decreases in the maximal value since the previous day or the last known value. **_DQOps activates this check only on numeric columns._** | Raises a *warning* severity issue when the increase or decrease in the maximum of numeric values is in the top 1% of the most significant day-to-day changes. |
605+
| [datatype](../categories-of-data-quality-checks/how-to-detect-data-type-changes.md) | <span class="no-wrap-code ">[`daily_partition_detected_datatype_in_text_changed`](../checks/column/datatype/detected-datatype-in-text-changed.md#daily-partition-detected-datatype-in-text-changed)</span> | Analyzes values in text columns to detect if all values are convertible to the same data type (boolean, numeric, date, etc). **_DQOps activates this check only on text columns._** | Raises a *warning* severity issue when the values found in a text column are in a different format or a new value that is not convertible to the previously detected data type is found. For example, the column *customer_id* in the landing zone table always contained integer values, and a non-numeric value appeared. |
606+
607+
The following extract of the *patterns/default.dqocolumnpattern.yaml* file shows the configuration
608+
of the default column-level [partition checks](definition-of-data-quality-checks/partition-checks.md).
609+
610+
``` { .yaml linenums="1" }
611+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnDefaultChecksPatternYaml-schema.json
612+
apiVersion: dqo/v1
613+
kind: default_column_checks
614+
spec:
615+
partitioned_checks:
616+
daily:
617+
nulls:
618+
daily_partition_nulls_count: {}
619+
daily_partition_nulls_percent: {}
620+
daily_partition_nulls_percent_anomaly:
621+
warning:
622+
anomaly_percent: 1.0
623+
daily_partition_not_nulls_percent: {}
624+
uniqueness:
625+
daily_partition_distinct_count_anomaly:
626+
warning:
627+
anomaly_percent: 1.0
628+
anomaly:
629+
daily_partition_sum_anomaly:
630+
warning:
631+
anomaly_percent: 1.0
632+
daily_partition_mean_anomaly:
633+
warning:
634+
anomaly_percent: 1.0
635+
daily_partition_min_anomaly:
636+
warning:
637+
anomaly_percent: 1.0
638+
daily_partition_max_anomaly:
639+
warning:
640+
anomaly_percent: 1.0
641+
datatype:
642+
daily_partition_detected_datatype_in_text_changed:
643+
warning: {}
644+
```
645+
527646

528647
## Next steps
529648
- Learn how to [monitor, review and react to data quality issues](../working-with-dqo/daily-monitoring-of-data-quality.md) detected by the default data quality checks.

0 commit comments

Comments
 (0)