title |
---|
List of data quality use cases and examples |
Check various examples demonstrating how the DQOps platform can detect data quality issues, and help you evaluate results on data quality dashboards.
Here is a comprehensive list of examples with links to the relevant documentation section with detailed descriptions. These examples use openly available datasets from Google Cloud.
Name of the example | Description | Link to the dataset description |
---|---|---|
Integrity check between columns in different tables | This example shows how to check the referential integrity of a column against a column in another table using lookup_key_found_percent check. | Link |
Name of the example | Description | Link to the dataset description |
---|---|---|
Detect table availability issues | This example shows how to verify that a query can be executed on a table and that the server does not return errors using table_availability check. | Link |
Name of the example | Description | Link to the dataset description |
---|---|---|
Detect incomplete columns | This example shows how to incomplete columns that have too many null values using the nulls_count check. | Link |
Detect empty or incomplete tables | This example shows how to find empty or too small tables using the row_count check. | Link |
Name of the example | Description | Link to the dataset description |
---|---|---|
Percentage of rows having only accepted values | This example shows how to verify that a text column contains only accepted values using the text_found_in_set_percent check. | Link |
Name of the example | Description | Link to the dataset description |
---|---|---|
Percentage of false boolean values | This example shows how to detect that the percentage of false values remains above a set threshold using false_percent check. | Link |
Percentage of values in range | This example shows how to detect that the percentage of values within a set range in a column does not exceed a set threshold using integer_in_range_percent check. | Link |
A text not exceeding a maximum length | This example shows how to check that the length of the text does not exceed the maximum value using text_max_length check. | Link |
Name of the example | Description | Link to the dataset description |
---|---|---|
Percentage of duplicates | This example shows how to detect that the percentage of duplicate values in a column does not exceed the maximum accepted percentage using duplicate_percent check. | Link |
Name of the example | Description | Link to the dataset description |
---|---|---|
Detect invalid emails | This example shows how to detect that the number of invalid emails in a column does not exceed the maximum accepted count using invalid_email_format_found check. | DQOps dataset |
Detect invalid IP4 address | This example shows how to detect that the number of invalid IP4 address in a column does not exceed a set threshold using invalid_ip4_address_format_found check. | DQOps dataset |
Percentage of negative values | This example shows how to detect that the percentage of negative values in a column does not exceed a set threshold using negative_values_percent check. | Link |
Percentage of rows passing SQL condition | This example shows how to detect that the percentage of passed sql condition in a column does not fall below a set threshold using sql_condition_passed_percent check. | Link |
Percentage of texts not matching a date pattern | This example shows how to detect that the percentage of texts matching the date format regex in a column does not exceed a set threshold using text_not_matching_date_pattern_percent check. | Link |
Percentage of valid currency codes | This example shows how to detect that the percentage of valid currency codes in a column does not fall below a set threshold using text_valid_currency_code_percent check. | DQOps dataset |
Percentage of valid latitude and longitude | This example shows how to detect that the percentage of valid latitude and longitude values remain above a set threshold using valid_latitude_percent and valid_longitude_percentchecks. | Link |
Percentage of invalid UUID | This example shows how to detect that the percentage of valid UUID values in a column does not fall below a set threshold using invalid_uuid_format_percent check. | DQOps dataset |
Percentage of rows containing USA zip codes | This example shows how to detect USA zip codes in text columns by measuring the percentage of rows containing a zip code using the contains_usa_zipcode_percent check. | Link |
Name of the example | Description | Link to the dataset description |
---|---|---|
Detect table schema changes | This example shows how to detect schema changes on the table using several schema detection checks. | Link |
Name of the example | Description | Link to the dataset description |
---|---|---|
Detect empty tables | This example shows how to detect empty tables using the default data quality checks. | Link |
Running checks with a scheduler | This example shows how to set different schedules on multiple checks. | Link |
To use the examples you need:
- Installed DQOps.
- A BiqQuery service account with BigQuery > BigQuery Job User permission. You can create a free trial Google Cloud account here.
- A working Google Cloud CLI if you want to use Google Application Credentials authentication.
After installing Google Cloud CLI, log in to your GCP account, by running:
gcloud auth application-default login
Standard DQOps installation comes with a set of examples, which can
be found in the example/
directory. You can view a complete list of the examples with links to detailed explanation above.
The example directory contains two configuration files: connection.dqoconnection.yaml
, which stores the data source
configuration, and *.dqotable.yaml
file, which stores the columns and tables metadata configuration.
While it is not necessary to manually add the connection in our examples, you can find information on how to do it in the Working with DQOps section.
To start the DQOps application with the example, follow the steps below.
-
Go to the directory where you installed DQOps and navigate, for example, to
examples/data-completeness/number-of-rows-in-the-table-bigquery
.Run the command provided below in the terminal. This will install DQOps on your computer.
=== "Windows"
``` run_dqo ```
=== "MacOS/Linux"
``` ./run_dqo ```
-
Create DQOps
DQOps user home
folder.After installation, you will be asked whether to initialize the DQOps user home folder in the default location. Type Y to create the folder. .
The DQOps user home folder locally stores data such as sensor and checkout readings, as well as data source configurations. You can learn more about data storage here.
-
Login to DQOps Cloud.
To use DQOps features, such as storing data quality definitions and results in the cloud or data quality dashboards, you must create a DQOps cloud account.
After creating the DQOps user home folder, you will be asked whether to log in to the DQOps cloud.
After typing Y, you will be redirected to https://cloud.dqops.com/registration, where you can create a new account, use Google single sign-on (SSO) or log in if you already have an account.
During the first registration, a unique identification code (API Key) will be generated and automatically passed to the DQOps application. The API Key is now stored in the configuration file.