Skip to content

Commit 4dfcc98

Browse files
committed
2 parents 31a1ee3 + 4c82dd1 commit 4dfcc98

File tree

36 files changed

+14026
-0
lines changed

36 files changed

+14026
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
,DESKTOP-A6A380Q/lyftrondev,DESKTOP-A6A380Q,30.10.2024 09:22,file:///C:/Users/lyftrondev/AppData/Roaming/LibreOffice/4;

dqops/sampledata/house_price_prediction_treated_dataset.csv

+6,701
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This is a marker file to identify a DQO_USER_HOME folder. Please check this file to Git.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.credentials/
2+
.data/
3+
.index/
4+
.logs/
5+
bin/
6+
jars/
7+
.venv/
8+
.localsettings.dqosettings.yaml
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors the count of distinct values in a column and raises an issue
7+
when an anomaly is detected.
8+
monitoring_checks:
9+
daily:
10+
uniqueness:
11+
daily_distinct_count_anomaly:
12+
warning:
13+
anomaly_percent: 0.1
14+
partitioned_checks:
15+
daily:
16+
uniqueness:
17+
daily_partition_distinct_count_anomaly:
18+
warning:
19+
anomaly_percent: 0.1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors the scale of null values in columns and raises an issue when
7+
the day-to-day change is significant.
8+
monitoring_checks:
9+
daily:
10+
nulls:
11+
daily_nulls_percent_anomaly:
12+
warning:
13+
anomaly_percent: 0.1
14+
partitioned_checks:
15+
daily:
16+
nulls:
17+
daily_partition_nulls_percent_anomaly:
18+
warning:
19+
anomaly_percent: 0.1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors the sum and average (mean) aggregated values of numeric values
7+
and raises a data quality issue when the value changes too much between daily
8+
partitions.
9+
partitioned_checks:
10+
daily:
11+
anomaly:
12+
daily_partition_sum_anomaly:
13+
warning:
14+
anomaly_percent: 0.05
15+
daily_partition_mean_anomaly:
16+
warning:
17+
anomaly_percent: 0.05
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors the sum and average (mean) aggregated values of numeric values
7+
and raises a data quality issue when the value changes too much day-to-day.
8+
monitoring_checks:
9+
daily:
10+
anomaly:
11+
daily_sum_anomaly:
12+
warning:
13+
anomaly_percent: 0.05
14+
daily_mean_anomaly:
15+
warning:
16+
anomaly_percent: 0.05
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_table_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors data volume of the whole table daily and raises an issue when
7+
the volume has increased of decreased significantly.
8+
monitoring_checks:
9+
daily:
10+
volume:
11+
daily_row_count_change:
12+
warning:
13+
max_percent: 10.0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
description: Detects when the values stored in a text column change their type.
7+
This policy should be activated on raw tables in the landing zones for table that
8+
store all values (also numeric an dates) in text columns.
9+
monitoring_checks:
10+
daily:
11+
datatype:
12+
daily_detected_datatype_in_text_changed:
13+
warning: {}
14+
partitioned_checks:
15+
daily:
16+
datatype:
17+
daily_partition_detected_datatype_in_text_changed:
18+
warning: {}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
description: "Monitors the schema of columns registered in DQOps. Raises a data\
7+
\ quality issue when the column is missing, or its data has changed."
8+
monitoring_checks:
9+
daily:
10+
schema:
11+
daily_column_exists:
12+
warning: {}
13+
daily_column_type_changed:
14+
warning: {}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
disabled: true
7+
description: Detects columns containing any null values using both monitoring checks
8+
and daily partitioned checks.
9+
monitoring_checks:
10+
daily:
11+
nulls:
12+
daily_nulls_count:
13+
warning:
14+
max_count: 0
15+
partitioned_checks:
16+
daily:
17+
nulls:
18+
daily_partition_nulls_count:
19+
warning:
20+
max_count: 0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_table_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors data freshness anomalies daily.
7+
monitoring_checks:
8+
daily:
9+
timeliness:
10+
daily_data_freshness_anomaly:
11+
warning:
12+
anomaly_percent: 0.1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_table_checks
4+
spec:
5+
priority: 1000
6+
description: "Monitors data volume of the whole table (using daily monitoring checks)\
7+
\ and for each daily partition, using daily partition checks."
8+
monitoring_checks:
9+
daily:
10+
volume:
11+
daily_row_count_anomaly:
12+
warning:
13+
anomaly_percent: 0.1
14+
partitioned_checks:
15+
daily:
16+
volume:
17+
daily_partition_row_count_anomaly:
18+
warning:
19+
anomaly_percent: 0.1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
disabled: true
7+
description: Detects empty columns using both monitoring checks an daily partitioned
8+
checks.
9+
monitoring_checks:
10+
daily:
11+
nulls:
12+
daily_empty_column_found:
13+
warning: {}
14+
partitioned_checks:
15+
daily:
16+
nulls:
17+
daily_partition_empty_column_found:
18+
warning: {}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_table_checks
4+
spec:
5+
priority: 1000
6+
description: Detects empty tables using daily monitoring checks.
7+
monitoring_checks:
8+
daily:
9+
volume:
10+
daily_row_count:
11+
warning:
12+
min_count: 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
disabled: true
7+
description: Monitors numeric columns to detect new smallest (min) or biggest (max)
8+
value for each daily partition. Raises a data quality issue when the partition
9+
contains a big or small value that exceeds regular ranges.
10+
partitioned_checks:
11+
daily:
12+
anomaly:
13+
daily_partition_min_anomaly:
14+
warning:
15+
anomaly_percent: 0.1
16+
daily_partition_max_anomaly:
17+
warning:
18+
anomaly_percent: 0.1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
description: "Monitors numeric columns to detect new smallest (min) or biggest (max)\
7+
\ value, which must be an anomaly."
8+
monitoring_checks:
9+
daily:
10+
anomaly:
11+
daily_min_anomaly:
12+
warning:
13+
anomaly_percent: 0.1
14+
daily_max_anomaly:
15+
warning:
16+
anomaly_percent: 0.1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
disabled: true
7+
description: Monitors the percentage of null values in columns and raises an issue
8+
when the day-to-day change is above a threshold.
9+
monitoring_checks:
10+
daily:
11+
nulls:
12+
daily_nulls_percent_change:
13+
warning:
14+
max_percent: 10.0
15+
partitioned_checks:
16+
daily:
17+
nulls:
18+
daily_partition_nulls_percent_change:
19+
warning:
20+
max_percent: 10.0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_table_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors table availability issues daily.
7+
monitoring_checks:
8+
daily:
9+
availability:
10+
daily_table_availability:
11+
warning:
12+
max_failures: 0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_table_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors the table schema and raises issues when the schema of the
7+
table was changed.
8+
monitoring_checks:
9+
daily:
10+
schema:
11+
daily_column_count_changed:
12+
warning: {}
13+
daily_column_list_changed:
14+
warning: {}
15+
daily_column_list_or_order_changed:
16+
warning: {}
17+
daily_column_types_changed:
18+
warning: {}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 1000
6+
description: "Activates data profiling checks on all text columns to detect if they\
7+
\ contain sensitive data (emails, phone numbers). Enabling this policy allows\
8+
\ the data quality rule miner to set up PII checks when sensitive values are identified."
9+
profiling_checks:
10+
pii:
11+
profile_contains_usa_phone_percent: {}
12+
profile_contains_email_percent: {}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ColumnLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_column_checks
4+
spec:
5+
priority: 2000
6+
description: Monitors the count and the percentage of null values without raising
7+
data quality issues.
8+
monitoring_checks:
9+
daily:
10+
nulls:
11+
daily_nulls_count: {}
12+
daily_nulls_percent: {}
13+
partitioned_checks:
14+
daily:
15+
nulls:
16+
daily_partition_nulls_count: {}
17+
daily_partition_nulls_percent: {}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableLevelDataQualityPolicyYaml-schema.json
2+
apiVersion: dqo/v1
3+
kind: default_table_checks
4+
spec:
5+
priority: 1000
6+
description: Monitors volume (row count) of daily partitions.
7+
partitioned_checks:
8+
daily:
9+
volume:
10+
daily_partition_row_count: {}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# packages in this file are installed when DQOps starts
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
@echo off
2+
..\..\..\dqo.cmd
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
#!/bin/sh
2+
../../../dqo

0 commit comments

Comments
 (0)