Skip to content

Commit

Permalink
Merge pull request #31 from databrickslabs/feature/dlt-meta-uc-cli
Browse files Browse the repository at this point in the history
- Added hugo docs for databricks labs cli option
  • Loading branch information
ravi-databricks authored Dec 21, 2023
2 parents 2214076 + 0ad5d67 commit b0e2e31
Show file tree
Hide file tree
Showing 12 changed files with 338 additions and 99 deletions.
12 changes: 2 additions & 10 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,9 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

**NOTE:** For CLI interfaces, we support SemVer approach. However, for API components we don't use SemVer as of now. This may lead to instability when using dbx API methods directly.

[Please read through the Keep a Changelog (~5min)](https://keepachangelog.com/en/1.0.0/).

## [v.0.0.5]
- enabled UC (link to PR)
- databricks labs cli integration (link to PR)
- Enabled Unity Catalog support: [PR](https://github.com/databrickslabs/dlt-meta/pull/28)
- Added databricks labs cli: [PR](https://github.com/databrickslabs/dlt-meta/pull/28)

## [v0.0.4] - 2023-10-09
### Added
Expand Down
90 changes: 90 additions & 0 deletions docs/content/getting_started/additionals1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
title: "Additionals"
date: 2021-08-04T14:25:26-04:00
weight: 21
draft: false
---
#### [DLT-META](https://github.com/databrickslabs/dlt-meta) DEMO's
1. [DAIS 2023 DEMO](#dais-2023-demo): Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically.
2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically.


##### DAIS 2023 DEMO
This Demo launches Bronze and Silver DLT pipleines with following activities:
- Customer and Transactions feeds for initial load
- Adds new feeds Product and Stores to existing Bronze and Silver DLT pipelines with metadata changes.
- Runs Bronze and Silver DLT for incremental load for CDC events

##### Steps:
1. Launch Terminal/Command promt

2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)

3. ```git clone https://github.com/databrickslabs/dlt-meta.git ```

4. ```cd dlt-meta```

5. Set python environment variable into terminal
```
export PYTHONPATH=<<local dlt-meta path>>
```
6. Run the command ```python demo/launch_dais_demo.py --username=<<your databricks username>> --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated_new```
- cloud_provider_name : aws or azure or gcp
- db_version : Databricks Runtime Version
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.
- - 6a. Databricks Workspace URL:
- - Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
- - 6b. Token:
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
- On the Access tokens tab, click Generate new token.
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
- Click Generate.
- Copy the displayed token
- Paste to command prompt
##### Databricks Tech Summit FY2024 DEMO:
This demo will launch auto generated tables(100s) inside single bronze and silver DLT pipeline using dlt-meta.
1. Launch Terminal/Command promt
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
3. ```git clone https://github.com/databrickslabs/dlt-meta.git ```
4. ```cd dlt-meta```
5. Set python environment variable into terminal
```
export PYTHONPATH=<<local dlt-meta path>>
```
6. Run the command ```python demo/launch_techsummit_demo.py --username=ravi.gawai@databricks.com --source=cloudfiles --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/techsummit-dlt-meta-demo-automated ```
- cloud_provider_name : aws or azure or gcp
- db_version : Databricks Runtime Version
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token
- - 6a. Databricks Workspace URL:
- Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
- - 6b. Token:
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
- On the Access tokens tab, click Generate new token.
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
- Click Generate.
- Copy the displayed token
- Paste to command prompt
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: "Additionals"
date: 2021-08-04T14:25:26-04:00
weight: 21
weight: 22
draft: false
---
This is easist way to launch dlt-meta to your databricks workspace with following steps.

## Run Integration Tests
#### Run Integration Tests
1. Launch Terminal/Command promt

2. Goto to DLT-META directory
Expand Down
13 changes: 0 additions & 13 deletions docs/content/getting_started/buildwhl.md

This file was deleted.

72 changes: 72 additions & 0 deletions docs/content/getting_started/dltpipelineopt1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: "Launch Generic DLT pipeline"
date: 2021-08-04T14:25:26-04:00
weight: 20
draft: false
---
## Option#1: Databricks Labs CLI
##### pre-requisites:
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)
- Python 3.8.0 +
##### Steps:
```shell
git clone dlt-meta
cd dlt-meta
python -m venv .venv
source .venv/bin/activate
pip install databricks-sdk
databricks labs dlt-meta onboard
```

- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command
#### Deploy Bronze DLT
```shell
databricks labs dlt-meta deploy
```
- Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
```shell
Deploy DLT-META with unity catalog enabled?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide unity catalog name: uc_catalog_name
Deploy DLT-META with serverless?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide dlt meta layer
[0] bronze
[1] silver
Enter a number between 0 and 1: 0
Provide dlt meta onboard group: A1
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9
Provide bronze dataflowspec table name (default: bronze_dataflowspec):
Provide dlt meta pipeline name (default: dlt_meta_bronze_pipeline_2aee):
Provide dlt target schema name: dltmeta_bronze_cf595
```

#### Deploy Silver DLT
```shell
databricks labs dlt-meta deploy
```
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
```shell
Deploy DLT-META with unity catalog enabled?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide unity catalog name: uc_catalog_name
Deploy DLT-META with serverless?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide dlt meta layer
[0] bronze
[1] silver
Enter a number between 0 and 1: 1
Provide dlt meta onboard group: A1
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9
Provide silver dataflowspec table name (default: silver_dataflowspec):
Provide dlt meta pipeline name (default: dlt_meta_silver_pipeline_21475):
Provide dlt target schema name: dltmeta_silver_5afa2
```
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
---
title: "Launch Generic DLT pipeline"
date: 2021-08-04T14:25:26-04:00
weight: 20
weight: 21
draft: false
---
### Option#2: Manual

### 1. Create a Delta Live Tables launch notebook
#### 1. Create a Delta Live Tables launch notebook

1. Go to your Databricks landing page and select Create a notebook, or click New Icon New in the sidebar and select Notebook. The Create Notebook dialog appears.

Expand Down
98 changes: 54 additions & 44 deletions docs/content/getting_started/runoboardingopt1.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,64 @@
---
title: "Running Onboarding"
title: "Run Onboarding"
date: 2021-08-04T14:25:26-04:00
weight: 17
draft: false
---

#### Option#1: Python whl job
1. Go to your Databricks landing page and do one of the following:
#### Option#1: Databricks Labs CLI
##### pre-requisites:
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)
- Python 3.8.0 +
##### Steps:
1. ``` git clone dlt-meta ```
2. ``` cd dlt-meta ```
3. ``` python -m venv .venv ```
4. ```source .venv/bin/activate ```
5. ``` pip install databricks-sdk ```

2. In the sidebar, click Jobs Icon Workflows and click Create Job Button.
##### run dlt-meta cli command:
```shell
databricks labs dlt-meta onboard
```
- Above command will prompt you to provide onboarding details.
- If you have cloned dlt-meta git repo then accepting defaults will launch config from [demo/conf](https://github.com/databrickslabs/dlt-meta/tree/main/demo/conf) folder.
- You can create onboarding files e.g onboarding.json, data quality and silver transformations and put it in conf folder as show in [demo/conf](https://github.com/databrickslabs/dlt-meta/tree/main/demo/conf)

3. In the sidebar, click New Icon New and select Job from the menu.

4. In the task dialog box that appears on the Tasks tab, replace Add a name for your job… with your job name, for example, Python wheel example.

5. In Task name, enter a name for the task, for example, ```dlt_meta_onboarding_pythonwheel_task```.

6. In Type, select Python wheel.

5. In Package name, enter ```dlt_meta```.

6. In Entry point, enter ``run``.

7. Click Add under Dependent Libraries. In the Add dependent library dialog, under Library Type, click PyPI. Enter Package = ```dlt-meta```

8. Click Add.

9. In Parameters, select keyword argument then select JSON. Past below json parameters with :
```json
{
"onboard_layer": "bronze_silver",
"database": "dlt_demo",
"onboarding_file_path": "dbfs:/onboarding_files/users_onboarding.json",
"silver_dataflowspec_table": "silver_dataflowspec_table",
"silver_dataflowspec_path": "dbfs:/onboarding_tables_cdc/silver",
"bronze_dataflowspec_table": "bronze_dataflowspec_table",
"import_author": "Ravi",
"version": "v1",
"bronze_dataflowspec_path": "dbfs:/onboarding_tables_cdc/bronze",
"onboard_layer": "bronze_silver",
"uc_enabled": "False",
"overwrite": "True",
"env": "dev"
}
```shell
Provide onboarding file path (default: demo/conf/onboarding.template):
Provide onboarding files local directory (default: demo/):
Provide dbfs path (default: dbfs:/dlt-meta_cli_demo):
Provide databricks runtime version (default: 14.2.x-scala2.12):
Run onboarding with unity catalog enabled?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide unity catalog name: uc_catalog_name
Provide dlt meta schema name (default: dlt_meta_dataflowspecs_203b9):
Provide dlt meta bronze layer schema name (default: dltmeta_bronze_cf595):
Provide dlt meta silver layer schema name (default: dltmeta_silver_5afa2):
Provide dlt meta layer
[0] bronze
[1] bronze_silver
[2] silver
Enter a number between 0 and 2: 1
Provide bronze dataflow spec table name (default: bronze_dataflowspec):
Provide silver dataflow spec table name (default: silver_dataflowspec):
Overwrite dataflow spec?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide dataflow spec version (default: v1):
Provide environment name (default: prod): prod
Provide import author name (default: ravi.gawai):
Provide cloud provider name
[0] aws
[1] azure
[2] gcp
Enter a number between 0 and 2: 0
Do you want to update ws paths, catalog, schema details to your onboarding file?
[0] False
[1] True
```

Alternatly you can enter keyword arguments, click + Add and enter a key and value. Click + Add again to enter more arguments.

10. Click Save task.

11. Run now

12. Make sure job run successfully. Verify metadata in your dataflow spec tables entered in step: 11 e.g ```dlt_demo.bronze_dataflowspec_table``` , ```dlt_demo.silver_dataflowspec_table```
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs
Loading

0 comments on commit b0e2e31

Please sign in to comment.