Skip to content

Latest commit

 

History

History
145 lines (97 loc) · 19.3 KB

index.md

File metadata and controls

145 lines (97 loc) · 19.3 KB
title
List of data quality use cases and examples

List of data quality use cases and examples

Check various examples demonstrating how the DQOps platform can detect data quality issues, and help you evaluate results on data quality dashboards.

List of data quality use cases

Here is a comprehensive list of examples with links to the relevant documentation section with detailed descriptions. These examples use openly available datasets from Google Cloud.

Data accuracy

Name of the example Description Link to the dataset description
Integrity check between columns in different tables This example shows how to check the referential integrity of a column against a column in another table using lookup_key_found_percent check. Link

Data availability

Name of the example Description Link to the dataset description
Detect table availability issues This example shows how to verify that a query can be executed on a table and that the server does not return errors using table_availability check. Link

Data completeness

Name of the example Description Link to the dataset description
Detect incomplete columns This example shows how to incomplete columns that have too many null values using the nulls_count check. Link
Detect empty or incomplete tables This example shows how to find empty or too small tables using the row_count check. Link

Data consistency

Name of the example Description Link to the dataset description
Percentage of rows having only accepted values This example shows how to verify that a text column contains only accepted values using the text_found_in_set_percent check. Link

Data reasonability

Name of the example Description Link to the dataset description
Percentage of false boolean values This example shows how to detect that the percentage of false values remains above a set threshold using false_percent check. Link
Percentage of values in range This example shows how to detect that the percentage of values within a set range in a column does not exceed a set threshold using integer_in_range_percent check. Link
A text not exceeding a maximum length This example shows how to check that the length of the text does not exceed the maximum value using text_max_length check. Link

Data uniqueness

Name of the example Description Link to the dataset description
Percentage of duplicates This example shows how to detect that the percentage of duplicate values in a column does not exceed the maximum accepted percentage using duplicate_percent check. Link

Data validity

Name of the example Description Link to the dataset description
Detect invalid emails This example shows how to detect that the number of invalid emails in a column does not exceed the maximum accepted count using invalid_email_format_found check. DQOps dataset
Detect invalid IP4 address This example shows how to detect that the number of invalid IP4 address in a column does not exceed a set threshold using invalid_ip4_address_format_found check. DQOps dataset
Percentage of negative values This example shows how to detect that the percentage of negative values in a column does not exceed a set threshold using negative_values_percent check. Link
Percentage of rows passing SQL condition This example shows how to detect that the percentage of passed sql condition in a column does not fall below a set threshold using sql_condition_passed_percent check. Link
Percentage of texts not matching a date pattern This example shows how to detect that the percentage of texts matching the date format regex in a column does not exceed a set threshold using text_not_matching_date_pattern_percent check. Link
Percentage of valid currency codes This example shows how to detect that the percentage of valid currency codes in a column does not fall below a set threshold using text_valid_currency_code_percent check. DQOps dataset
Percentage of valid latitude and longitude This example shows how to detect that the percentage of valid latitude and longitude values remain above a set threshold using valid_latitude_percent and valid_longitude_percentchecks. Link
Percentage of invalid UUID This example shows how to detect that the percentage of valid UUID values in a column does not fall below a set threshold using invalid_uuid_format_percent check. DQOps dataset
Percentage of rows containing USA zip codes This example shows how to detect USA zip codes in text columns by measuring the percentage of rows containing a zip code using the contains_usa_zipcode_percent check. Link

Schema

Name of the example Description Link to the dataset description
Detect table schema changes This example shows how to detect schema changes on the table using several schema detection checks. Link

Data quality monitoring

Name of the example Description Link to the dataset description
Detect empty tables This example shows how to detect empty tables using the default data quality checks. Link
Running checks with a scheduler This example shows how to set different schedules on multiple checks. Link

Prerequisite

To use the examples you need:

After installing Google Cloud CLI, log in to your GCP account, by running:

gcloud auth application-default login

Location of the examples

Standard DQOps installation comes with a set of examples, which can be found in the example/ directory. You can view a complete list of the examples with links to detailed explanation above.

The example directory contains two configuration files: connection.dqoconnection.yaml, which stores the data source configuration, and *.dqotable.yaml file, which stores the columns and tables metadata configuration.

While it is not necessary to manually add the connection in our examples, you can find information on how to do it in the Working with DQOps section.

Start DQOps

To start the DQOps application with the example, follow the steps below.

  1. Go to the directory where you installed DQOps and navigate, for example, to examples/data-completeness/number-of-rows-in-the-table-bigquery.

    Run the command provided below in the terminal. This will install DQOps on your computer.

    === "Windows"

     ```
     run_dqo
     ```
    

    === "MacOS/Linux"

     ```
     ./run_dqo
     ```
    
  2. Create DQOps DQOps user home folder.

    After installation, you will be asked whether to initialize the DQOps user home folder in the default location. Type Y to create the folder. .

    Initializing DQOps user home folder{ loading=lazy }

    The DQOps user home folder locally stores data such as sensor and checkout readings, as well as data source configurations. You can learn more about data storage here.

  3. Login to DQOps Cloud.

    To use DQOps features, such as storing data quality definitions and results in the cloud or data quality dashboards, you must create a DQOps cloud account.

    After creating the DQOps user home folder, you will be asked whether to log in to the DQOps cloud.

    Log in to DQOps Cloud{ loading=lazy }

    After typing Y, you will be redirected to https://cloud.dqops.com/registration, where you can create a new account, use Google single sign-on (SSO) or log in if you already have an account.

    During the first registration, a unique identification code (API Key) will be generated and automatically passed to the DQOps application. The API Key is now stored in the configuration file.