You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/getting-started/add-data-source-connection.md
+61-38
Original file line number
Diff line number
Diff line change
@@ -3,8 +3,8 @@ This guide shows how to connect a data source to DQOps, import the metadata, and
3
3
4
4
## Overview
5
5
6
-
After [installation and starting DQOps](installation.md), we describe how to add a connection to [BigQuery public dataset Austin Crime Data](https://console.cloud.google.com/marketplace/details/city-of-austin/austin-crime)
7
-
using the user interface.
6
+
After [installation and starting DQOps](installation.md), we describe how to add a connection to a CSV file using the user interface.
7
+
We present the example file used in this **guide**.
8
8
9
9
For a full description of how to add a data source connection to other providers or add connection using the command-line shell,
10
10
see [Working with DQOps section](../data-sources/index.md).
@@ -17,69 +17,92 @@ You can find more information about [navigating the DQOps user interface here](.
17
17
18
18
Links to some supported data sources are shown below.
To add a connection to a BigQuery data source to DQOps you need the following:
27
+
Choose a CSV file you want to analyse. To add a connection to a CSV file data source to DQOps you need one.
28
28
29
-
- A BiqQuery service account with **BigQuery > BigQuery Job User** permission. [You can create a free trial Google Cloud account here](https://cloud.google.com/free).
30
-
- A service account key in JSON format for JSON key authentication. For details refer to [Create and delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete).
31
-
- A working [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) if you want to use [Google Application Credentials authentication](../data-sources/bigquery.md#using-google-application-credentials-authentication).
29
+
You can also download a CSV file used in this guide.
30
+
The table below presents the fragment of it's content.
32
31
33
-
We have chosen to use BigQuery data source for this getting started guide because public BigQuery datasets are freely available,
34
-
and you can query them within the GCP FREE tier monthly limit.
| 2015821204 | "1713 MULLEN DR Austin, TX" || 2015-03-25 12:00:00.000000 UTC | Not cleared || THEFT | UK |||| 1713 MULLEN DR | Theft | 2015-03-23 12:00:00.000000 UTC ||| 2015 ||
35
+
| 2015150483 | "Austin, TX" || 2015-01-27 12:00:00.000000 UTC | Not cleared || RAPE | B |||| nan | Rape | 2015-01-15 12:00:00.000000 UTC ||| 2015 ||
36
+
| 2015331540 | "5510 S IH 35 SVRD Austin, TX" || 2015-02-11 12:00:00.000000 UTC | Not cleared || BURGLARY OF VEHICLE | UK |||| 5510 S IH 35 SVRD | Theft | 2015-02-02 12:00:00.000000 UTC ||| 2015 ||
37
+
| 2015331238 | "7928 US HWY 71 W Austin, TX" || 2015-02-12 12:00:00.000000 UTC | Not cleared || THEFT OF HEAVY EQUIPMENT | UK |||| 7928 US HWY 71 W | Theft | 2015-02-02 12:00:00.000000 UTC ||| 2015 ||
## Add BigQuery connection using the user interface
40
+
The file is a sample of Austin Crime file from BigQuery public dataset Austin Crime Data.
41
+
42
+
### Downloading the example file
43
+
44
+
To download the example CSV file, [open the github webpage](https://github.com/dqops/dqo/blob/develop/dqops/sampledata/files/csv/austin_crime_sample/austin_crime.csv).
45
+
46
+
On the right side you can see the three dots button. When button is clicked the **download** becomes available on the expanded list.
| Connection name | The name of the connection that will be created in DQO. This will also be the name of the folder where the connection configuration files are stored. The name of the connection must be unique and consist of alphanumeric characters, hyphens and underscore. For example, "**testconnection**" |
59
-
| Source GCP project ID | Name of the project that has datasets that will be imported. In our example, it is "**bigquery-public-data**". |
60
-
| Authentication mode to the Google Cloud | Type of authentication mode to the Google Cloud. You can select from the 3 options:<br/>- Google Application Credentials,<br/>- JSON Key Content<br/> - JSON Key Path |
61
-
| GCP project to create BigQuery jobs, where the authenticated principal has bigquery.jobs.create permission | Google Cloud Platform project which will be used to create BigQuery jobs. In this project, the authenticated user must have bigquery.jobs.create permission. You can select from the 3 options:<br/>- Create jobs in source project<br/>- Create jobs in default project from credentials<br/> - Create jobs in selected billing project ID.<br/>Please pick the third option *Create jobs in selected billing project ID*. You will need your own GCP project where you have permission to run BigQuery jobs. |
62
-
| Billing GCP project ID | The ID of the selected billing GCP project. In this project, the authenticated user must have bigquery.jobs.create permission. This field is active when you select the "Create jobs in selected billing project ID" option. <br/> Please fill this field with the name of your own GCP project where you have the right to run BigQuery jobs. Alternatively, it can be your testing project where you are the **owner**. |
63
-
| Quota GCP project ID | The Google Cloud Platform project ID which is used for BigQuery quota. You can leave this field empty. |
| Connection name | The name of the connection that will be created in DQOps. This will also be the name of the folder where the connection configuration files are stored. The name of the connection must be unique and consist of alphanumeric characters. |
86
+
| Path | The path prefix to the parent directory with data. The path must be absolute. The virtual schema name is a value of the directories mapping. |
64
87
65
88
After filling in the connection settings, click the **Test Connection** button to test the connection.
89
+
It will inform you if the path to the CSV file may be incorrect.
66
90
67
91
Click the **Save** connection button when the test is successful to add a new connection.
68
92
Otherwise, you can check the details of what went wrong.
69
-
70
93
71
94
## **Import metadata using the user interface**
72
95
73
96
When you add a new connection, it will appear in the tree view on the left, and you will be redirected to the Import Metadata screen.
74
97
Now we can import schemas and tables.
75
98
76
-
1. Import the "austin_crime" schema by clicking on the **Import Tables** button.
99
+
1. Import the "files" schema by clicking on the **Import Tables**
You will be linked to **Data Source** section, **Schedule** tab where you can review scheduling settings for the added connection.
109
132
110
133
The scheduling is enabled by default. You can turn it off by clicking the notification icon in the upper right corner and
111
-
then clicking the **Job scheduler** toggle button.
134
+
then clicking the **Job scheduler** toggle button.
112
135
113
-

136
+

114
137
115
138
116
139
## Explore the connection-level tabs in the Data sources section
@@ -138,9 +161,9 @@ At the table level in the **Data sources** section, there are the following tabs
138
161
-**Date and time columns** - allows [configuring event and ingestion timestamp columns for timeliness checks](../working-with-dqo/run-data-quality-checks.md#configure-event-and-ingestion-timestamp-columns-for-timeliness-checks), as well as [date or datetime column for partition checks](../working-with-dqo/run-data-quality-checks.md#configure-date-or-datetime-column-for-partition-checks).
139
162
-**Incident configuration** - allows configuring incidents. [Learn more about incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) that let you keep track of the issues that arise during data quality monitoring.
140
163
141
-
You can check the details of the imported table by expanding the tree view on the left and selecting the "crime" table.
164
+
You can check the details of the imported table by expanding the tree view on the left and selecting the "austin_crime.csv" table.
Copy file name to clipboardexpand all lines: docs/getting-started/index.md
+10-8
Original file line number
Diff line number
Diff line change
@@ -2,19 +2,21 @@
2
2
This guide contains a quick tutorial on how to get started with DQOps using the web interface, analyze a data source, and review the data quality results.
3
3
4
4
## Sample data
5
-
In the example, we will a add connection to the [BigQuery public dataset Austin Crime Data](https://console.cloud.google.com/marketplace/details/city-of-austin/austin-crime).
6
-
Next, we will run and review [Basic statistics](../working-with-dqo/collecting-basic-data-statistics.md), and automatically added profiling and monitoring [data quality checks](../dqo-concepts/definition-of-data-quality-checks/index.md).
5
+
In the example, we will add a **connection to a CSV file** data source.
6
+
The file contains a sample of [BigQuery public dataset Austin Crime Data](https://console.cloud.google.com/marketplace/details/city-of-austin/austin-crime).
7
+
8
+
Next, we will run and review [Basic statistics](../working-with-dqo/collecting-basic-data-statistics.md),
9
+
and automatically added profiling and monitoring [data quality checks](../dqo-concepts/definition-of-data-quality-checks/index.md).
10
+
7
11
Finally, we will review the data quality results on the [data quality dashboards](../dqo-concepts/types-of-data-quality-dashboards.md).
8
12
9
-
!!! note "Google BigQuery is not the only supported data source"
13
+
!!! note "Diverse connection options in DQOps."
10
14
11
-
We are using Google BigQuery in the *getting started* guide and [DQOps use cases](../examples/index.md) because
12
-
Google provides sample datasets for free. You can reproduce all steps shown in this *getting started* guide
13
-
on the same sample data.
15
+
The CSV file connection is used in this *getting started* guide because no additional database configuration is needed.
14
16
15
17
The list of [data sources supported by DQOps](../data-sources/index.md) shows the connection screens to
16
-
analyze data quality of other databases. The steps to connect to a different data source are the same as described in this
17
-
*getting started* guide.
18
+
analyze data quality of other databases.
19
+
The steps to connect to a different data source are the same as described in this *getting started* guide.
0 commit comments