-
Notifications
You must be signed in to change notification settings - Fork 76
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #31 from databrickslabs/feature/dlt-meta-uc-cli
- Added hugo docs for databricks labs cli option
- Loading branch information
Showing
12 changed files
with
338 additions
and
99 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
--- | ||
title: "Additionals" | ||
date: 2021-08-04T14:25:26-04:00 | ||
weight: 21 | ||
draft: false | ||
--- | ||
#### [DLT-META](https://github.com/databrickslabs/dlt-meta) DEMO's | ||
1. [DAIS 2023 DEMO](#dais-2023-demo): Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically. | ||
2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically. | ||
|
||
|
||
##### DAIS 2023 DEMO | ||
This Demo launches Bronze and Silver DLT pipleines with following activities: | ||
- Customer and Transactions feeds for initial load | ||
- Adds new feeds Product and Stores to existing Bronze and Silver DLT pipelines with metadata changes. | ||
- Runs Bronze and Silver DLT for incremental load for CDC events | ||
|
||
##### Steps: | ||
1. Launch Terminal/Command promt | ||
|
||
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) | ||
|
||
3. ```git clone https://github.com/databrickslabs/dlt-meta.git ``` | ||
|
||
4. ```cd dlt-meta``` | ||
|
||
5. Set python environment variable into terminal | ||
``` | ||
export PYTHONPATH=<<local dlt-meta path>> | ||
``` | ||
6. Run the command ```python demo/launch_dais_demo.py --username=<<your databricks username>> --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated_new``` | ||
- cloud_provider_name : aws or azure or gcp | ||
- db_version : Databricks Runtime Version | ||
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines | ||
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token. | ||
- - 6a. Databricks Workspace URL: | ||
- - Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs. | ||
- - 6b. Token: | ||
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down. | ||
- On the Access tokens tab, click Generate new token. | ||
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank). | ||
- Click Generate. | ||
- Copy the displayed token | ||
- Paste to command prompt | ||
##### Databricks Tech Summit FY2024 DEMO: | ||
This demo will launch auto generated tables(100s) inside single bronze and silver DLT pipeline using dlt-meta. | ||
1. Launch Terminal/Command promt | ||
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) | ||
3. ```git clone https://github.com/databrickslabs/dlt-meta.git ``` | ||
4. ```cd dlt-meta``` | ||
5. Set python environment variable into terminal | ||
``` | ||
export PYTHONPATH=<<local dlt-meta path>> | ||
``` | ||
6. Run the command ```python demo/launch_techsummit_demo.py --username=ravi.gawai@databricks.com --source=cloudfiles --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/techsummit-dlt-meta-demo-automated ``` | ||
- cloud_provider_name : aws or azure or gcp | ||
- db_version : Databricks Runtime Version | ||
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines | ||
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token | ||
- - 6a. Databricks Workspace URL: | ||
- Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs. | ||
- - 6b. Token: | ||
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down. | ||
- On the Access tokens tab, click Generate new token. | ||
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank). | ||
- Click Generate. | ||
- Copy the displayed token | ||
- Paste to command prompt |
5 changes: 2 additions & 3 deletions
5
docs/content/getting_started/additionals.md → docs/content/getting_started/additionals2.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
--- | ||
title: "Launch Generic DLT pipeline" | ||
date: 2021-08-04T14:25:26-04:00 | ||
weight: 20 | ||
draft: false | ||
--- | ||
## Option#1: Databricks Labs CLI | ||
##### pre-requisites: | ||
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) | ||
- Python 3.8.0 + | ||
##### Steps: | ||
```shell | ||
git clone dlt-meta | ||
cd dlt-meta | ||
python -m venv .venv | ||
source .venv/bin/activate | ||
pip install databricks-sdk | ||
databricks labs dlt-meta onboard | ||
``` | ||
|
||
- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command | ||
#### Deploy Bronze DLT | ||
```shell | ||
databricks labs dlt-meta deploy | ||
``` | ||
- Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps | ||
```shell | ||
Deploy DLT-META with unity catalog enabled? | ||
[0] False | ||
[1] True | ||
Enter a number between 0 and 1: 1 | ||
Provide unity catalog name: uc_catalog_name | ||
Deploy DLT-META with serverless? | ||
[0] False | ||
[1] True | ||
Enter a number between 0 and 1: 1 | ||
Provide dlt meta layer | ||
[0] bronze | ||
[1] silver | ||
Enter a number between 0 and 1: 0 | ||
Provide dlt meta onboard group: A1 | ||
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9 | ||
Provide bronze dataflowspec table name (default: bronze_dataflowspec): | ||
Provide dlt meta pipeline name (default: dlt_meta_bronze_pipeline_2aee): | ||
Provide dlt target schema name: dltmeta_bronze_cf595 | ||
``` | ||
|
||
#### Deploy Silver DLT | ||
```shell | ||
databricks labs dlt-meta deploy | ||
``` | ||
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps | ||
```shell | ||
Deploy DLT-META with unity catalog enabled? | ||
[0] False | ||
[1] True | ||
Enter a number between 0 and 1: 1 | ||
Provide unity catalog name: uc_catalog_name | ||
Deploy DLT-META with serverless? | ||
[0] False | ||
[1] True | ||
Enter a number between 0 and 1: 1 | ||
Provide dlt meta layer | ||
[0] bronze | ||
[1] silver | ||
Enter a number between 0 and 1: 1 | ||
Provide dlt meta onboard group: A1 | ||
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9 | ||
Provide silver dataflowspec table name (default: silver_dataflowspec): | ||
Provide dlt meta pipeline name (default: dlt_meta_silver_pipeline_21475): | ||
Provide dlt target schema name: dltmeta_silver_5afa2 | ||
``` |
5 changes: 3 additions & 2 deletions
5
docs/content/getting_started/dltpipeline.md → ...ontent/getting_started/dltpipelineopt2.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,54 +1,64 @@ | ||
--- | ||
title: "Running Onboarding" | ||
title: "Run Onboarding" | ||
date: 2021-08-04T14:25:26-04:00 | ||
weight: 17 | ||
draft: false | ||
--- | ||
|
||
#### Option#1: Python whl job | ||
1. Go to your Databricks landing page and do one of the following: | ||
#### Option#1: Databricks Labs CLI | ||
##### pre-requisites: | ||
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) | ||
- Python 3.8.0 + | ||
##### Steps: | ||
1. ``` git clone dlt-meta ``` | ||
2. ``` cd dlt-meta ``` | ||
3. ``` python -m venv .venv ``` | ||
4. ```source .venv/bin/activate ``` | ||
5. ``` pip install databricks-sdk ``` | ||
|
||
2. In the sidebar, click Jobs Icon Workflows and click Create Job Button. | ||
##### run dlt-meta cli command: | ||
```shell | ||
databricks labs dlt-meta onboard | ||
``` | ||
- Above command will prompt you to provide onboarding details. | ||
- If you have cloned dlt-meta git repo then accepting defaults will launch config from [demo/conf](https://github.com/databrickslabs/dlt-meta/tree/main/demo/conf) folder. | ||
- You can create onboarding files e.g onboarding.json, data quality and silver transformations and put it in conf folder as show in [demo/conf](https://github.com/databrickslabs/dlt-meta/tree/main/demo/conf) | ||
|
||
3. In the sidebar, click New Icon New and select Job from the menu. | ||
|
||
4. In the task dialog box that appears on the Tasks tab, replace Add a name for your job… with your job name, for example, Python wheel example. | ||
|
||
5. In Task name, enter a name for the task, for example, ```dlt_meta_onboarding_pythonwheel_task```. | ||
|
||
6. In Type, select Python wheel. | ||
|
||
5. In Package name, enter ```dlt_meta```. | ||
|
||
6. In Entry point, enter ``run``. | ||
|
||
7. Click Add under Dependent Libraries. In the Add dependent library dialog, under Library Type, click PyPI. Enter Package = ```dlt-meta``` | ||
|
||
8. Click Add. | ||
|
||
9. In Parameters, select keyword argument then select JSON. Past below json parameters with : | ||
```json | ||
{ | ||
"onboard_layer": "bronze_silver", | ||
"database": "dlt_demo", | ||
"onboarding_file_path": "dbfs:/onboarding_files/users_onboarding.json", | ||
"silver_dataflowspec_table": "silver_dataflowspec_table", | ||
"silver_dataflowspec_path": "dbfs:/onboarding_tables_cdc/silver", | ||
"bronze_dataflowspec_table": "bronze_dataflowspec_table", | ||
"import_author": "Ravi", | ||
"version": "v1", | ||
"bronze_dataflowspec_path": "dbfs:/onboarding_tables_cdc/bronze", | ||
"onboard_layer": "bronze_silver", | ||
"uc_enabled": "False", | ||
"overwrite": "True", | ||
"env": "dev" | ||
} | ||
```shell | ||
Provide onboarding file path (default: demo/conf/onboarding.template): | ||
Provide onboarding files local directory (default: demo/): | ||
Provide dbfs path (default: dbfs:/dlt-meta_cli_demo): | ||
Provide databricks runtime version (default: 14.2.x-scala2.12): | ||
Run onboarding with unity catalog enabled? | ||
[0] False | ||
[1] True | ||
Enter a number between 0 and 1: 1 | ||
Provide unity catalog name: uc_catalog_name | ||
Provide dlt meta schema name (default: dlt_meta_dataflowspecs_203b9): | ||
Provide dlt meta bronze layer schema name (default: dltmeta_bronze_cf595): | ||
Provide dlt meta silver layer schema name (default: dltmeta_silver_5afa2): | ||
Provide dlt meta layer | ||
[0] bronze | ||
[1] bronze_silver | ||
[2] silver | ||
Enter a number between 0 and 2: 1 | ||
Provide bronze dataflow spec table name (default: bronze_dataflowspec): | ||
Provide silver dataflow spec table name (default: silver_dataflowspec): | ||
Overwrite dataflow spec? | ||
[0] False | ||
[1] True | ||
Enter a number between 0 and 1: 1 | ||
Provide dataflow spec version (default: v1): | ||
Provide environment name (default: prod): prod | ||
Provide import author name (default: ravi.gawai): | ||
Provide cloud provider name | ||
[0] aws | ||
[1] azure | ||
[2] gcp | ||
Enter a number between 0 and 2: 0 | ||
Do you want to update ws paths, catalog, schema details to your onboarding file? | ||
[0] False | ||
[1] True | ||
``` | ||
|
||
Alternatly you can enter keyword arguments, click + Add and enter a key and value. Click + Add again to enter more arguments. | ||
|
||
10. Click Save task. | ||
|
||
11. Run now | ||
|
||
12. Make sure job run successfully. Verify metadata in your dataflow spec tables entered in step: 11 e.g ```dlt_demo.bronze_dataflowspec_table``` , ```dlt_demo.silver_dataflowspec_table``` | ||
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs |
Oops, something went wrong.