This repository contains terraform
code to deploy Databricks workspace for training purpose in AWS.
- AWS VPC, with
- 4 subnets
- 2 Private subnets (connected to Internet via NAT Gateway with routing table), each in different availability zone
- 2 Public subnets (connected to Internet Gateway with routing table), each in different availability zone
- 1 NAT Gateway, in the first public subnet availability zone
- 1 Internet Gateway
- 1 Routing table
- 0.0.0.0/0 to Internet Gateway
- 1 Network ACL
- 1 Ingress rule within VPC (by VPC CIDR)
- 1 Ingress rule from all (required by Databricks)
- 1 Egress rule within VPC (by VPC CIDR)
- 2 Egress rules to Internet HTTP(s) (TCP/80 and TCP/443)
- 2 Egress rules required to communicate with Databricks Infra (TCP/3306 and TCP/6666)
- 1 Security Group
- 1 Ingress rule within VPC
- 1 Egress rule within VPC
- 2 Egress rules to Internet HTTP(s) (TCP/80 and TCP/443)
- 2 Egress rules required to communicate with Databricks Infra (TCP/3306 and TCP/6666)
- 4 subnets
- AWS S3 Bucket for Databricks Unity Catalog (region-specific)
- Important! One AWS region can only setup one Databricks Unity Catalog. If you want to reuse the existing Databricks Unity Catalog, then change the
terraform
code accordingly.
- Important! One AWS region can only setup one Databricks Unity Catalog. If you want to reuse the existing Databricks Unity Catalog, then change the
- AWS S3 IAM Policy for Databricks to access S3 bucket
- "Databricks on AWS" Storage configuration
- "Databricks on AWS" Network configuration
- "Databricks on AWS" Workspace (E2) (region-specific)
- "Databricks on AWS" Clusters
- Instructors' Clusters
- Data Engineering
- Machine Learning
- Students' Clusters
- Data Engineering
- Machine Learning
- Instructors' Clusters
- AWS Databricks Training Materials ((c) Databricks)
- AWS User for Terraform, with access key/secret generated.
- "Databricks on AWS" account (can be found with link here), which is already created by following this documentation.
- "Databricks on AWS" Credential configuration (Setup guide). Use "Customer-managed VPC with default restrictions policy" for training purpose.
- Databricks Group
Databricks Unity Catalog Administrators
(this is created separately from this project). - Databricks
Service Principal
have been created in "Databricks on AWS" Account.
aws_region = "<AWS region>"
aws_access_key = "<AWS Access Key ID>"
aws_secret_key = "<AWS Secret Value>"
aws_terraform_role_arn = "<AWS Terraform Role ARN>"
databricks_account_id = "<`Databricks on AWS` account ID>"
databricks_client_id = "<`Databricks on AWS` Service Principal ID>"
databricks_client_secret = "<`Databricks on AWS` Service Principal secret>"
databricks_account_cloud_credentials_id = "<`Databricks on AWS` Credential Configuration ID>"
The values must be hard-coded (cannot be used with variables), as it is limitation on the backend config.
region = "<AWS region>"
bucket = "<AWS S3 Bucket Name>"
key = "<AWS S3 Object Name>"
- Install AWS CLI
aws
&terraform
- Login AWS CLI, run
aws configure
cd
to the correct sub-folder first, e.g.cd ./environments/20240426
- Install terraform providers, run
terraform init --backend-config=backend.tfvars
- Check and see if there is anything wrong, run
terraform plan -var-file='<file>.tfvars' -out='<file>.tfplan'
- Deploy the infra, run
terraform apply '<file>.tfplan'
- To remove the whole deployment, run
terraform plan -destroy -var-file='<file>.tfvars' -out='<file-destroy>.tfplan'
and thenterraform apply '<file-destroy>.tfplan'
The user list can be modified to suit your needs, e.g. number of users required.
As this repository is served for creating training workspace, therefore the users are divided into 2 groups, Instructors
and Students
.
The example format of the users are
student01.databricks.<training-date-yyyyMMdd>@<your email domain>
Pre-requisite steps documents are listed in the links below.
- Databricks administration introduction | Databricks on AWS
- OAuth machine-to-machine (M2M) authentication | Databricks on AWS
- Databricks Terraform provider | Databricks on AWS
- Create Databricks workspaces using Terraform | Databricks on AWS
- Docs overview | databricks/databricks | Terraform | Terraform Registry
hashicorp/aws
databricks/databricks