You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# MAGIC ### Prepare checks manually and save in the workspace (optional)
112
112
# MAGIC
113
113
# MAGIC You can modify the check candidates generated by the profiler to suit your needs. Alternatively, you can create checks manually, as demonstrated below, without using the profiler.
114
114
@@ -161,7 +161,7 @@
161
161
162
162
dq_engine=DQEngine(WorkspaceClient())
163
163
# save checks to location specified in the default run configuration inside workspace installation folder
Copy file name to clipboardexpand all lines: docs/dqx/docs/demos.mdx
+2-3
Original file line number
Diff line number
Diff line change
@@ -4,8 +4,7 @@ sidebar_position: 4
4
4
5
5
# Demos
6
6
7
-
After the [installation](/docs/installation) of the framework,
8
-
you can import the following notebooks in the Databricks workspace to try it out:
7
+
Install the [installation](/docs/installation) framework, and import the following notebooks in the Databricks workspace to try it out:
9
8
*[DQX Demo Notebook (library)](https://github.com/databrickslabs/dqx/blob/main/demos/dqx_demo_library.py) - demonstrates how to use DQX as a library.
10
-
*[DQX Demo Notebook (tool)](https://github.com/databrickslabs/dqx/blob/main/demos/dqx_demo_tool.py) - demonstrates how to use DQX when installed in the workspace, including usage of DQX dashboards.
9
+
*[DQX Demo Notebook (tool)](https://github.com/databrickslabs/dqx/blob/main/demos/dqx_demo_tool.py) - demonstrates how to use DQX as a tool when installed in the workspace.
11
10
*[DQX DLT Demo Notebook](https://github.com/databrickslabs/dqx/blob/main/demos/dqx_dlt_demo.py) - demonstrates how to use DQX with Delta Live Tables (DLT).
- `check`: column expression containing "function" (check function to apply), "arguments" (check function arguments), and "col_name" (column name as `str` the check will be applied for) or "col_names" (column names as `array` the check will be applied for).
154
155
- (optional) `name` for the check: autogenerated if not provided.
155
156
156
-
#### Loading and execution methods
157
+
### Loading and execution methods
157
158
158
-
**Method 1: load checks from a workspace file in the installation folder**
159
+
#### Method 1: Loading checks from a workspace file in the installation folder
159
160
160
161
If DQX is installed in the workspace, you can load checks based on the run configuration:
161
162
@@ -164,9 +165,10 @@ from databricks.labs.dqx.engine import DQEngine
164
165
from databricks.sdk import WorkspaceClient
165
166
166
167
dq_engine = DQEngine(WorkspaceClient())
167
-
168
168
# load check file specified in the run configuration
See details of the check functions [here](/docs/reference/#quality-rules--functions).
302
+
See details of the check functions [here](/docs/reference#quality-rules).
293
303
294
304
### Integration with DLT (Delta Live Tables)
295
305
296
306
DLT provides [expectations](https://docs.databricks.com/en/delta-live-tables/expectations.html) to enforce data quality constraints. However, expectations don't offer detailed insights into why certain checks fail.
297
307
The example below demonstrates how to integrate DQX with DLT to provide comprehensive quality information.
298
-
The DQX integration does not use expectations with DLT but DQX own methods.
308
+
The DQX integration with DLT does not use DLT Expectations but DQX own methods.
299
309
300
-
**Option 1: apply quality rules and quarantine bad records**
310
+
#### Option 1: Apply quality rules and quarantine bad records
301
311
302
312
```python
303
313
import dlt
@@ -326,7 +336,7 @@ def quarantine():
326
336
return dq_engine.get_invalid(df)
327
337
```
328
338
329
-
**Option 2: apply quality rules as additional columns (`_warning` and `_error`)**
339
+
#### Option 2: Apply quality rules and report issues as additional columns
330
340
331
341
```python
332
342
import dlt
@@ -367,6 +377,29 @@ After executing the command:
367
377
Note: the dashboards are only using the quarantined data as input as defined during the installation process.
368
378
If you change the quarantine table in the run config after the deployment (`quarantine_table` field), you need to update the dashboard queries accordingly.
369
379
370
-
## Explore Quality Rules and Create Custom Checks
380
+
## Quality Rules and Creation of Custom Checks
381
+
382
+
Discover the full list of available data quality rules and learn how to define your own custom checks in our [Reference](/docs/reference#quality-rules) section.
383
+
384
+
## Details on DQX Engine and Workspace Client
385
+
386
+
To perform data quality checking with DQX, you need to create `DQEngine` object.
387
+
The engine requires a Databricks workspace client for authentication and interaction with the Databricks workspace.
388
+
389
+
When running the code on a Databricks workspace (e.g. in a notebook or as a job), the workspace client is automatically authenticated.
390
+
For external environments (e.g. CI servers or local machines), you can authenticate using any method supported by the Databricks SDK. Detailed instructions are available in the [default authentication flow](https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html#default-authentication-flow).
391
+
392
+
If you use Databricks [configuration profiles](https://docs.databricks.com/dev-tools/auth.html#configuration-profiles) or Databricks-specific [environment variables](https://docs.databricks.com/dev-tools/auth.html#environment-variables) for authentication, you only need the following code to create a workspace client:
393
+
```python
394
+
from databricks.sdk import WorkspaceClient
395
+
from databricks.labs.dqx.engine import DQEngine
396
+
397
+
ws = WorkspaceClient()
398
+
399
+
# use the workspace client to create the DQX engine
400
+
dq_engine = DQEngine(ws)
401
+
```
402
+
403
+
For details on the specific methods available in the engine, visit to the [reference](/docs/reference#dq-engine-methods) section.
371
404
372
-
Discover the full list of available data quality rules and learn how to define your own custom checks in our [Reference](/docs/reference) section.
405
+
Information on testing applications that use `DQEngine` can be found [here](/docs/reference#testing-applications-using-dqx).
0 commit comments