Skip to content

Latest commit

 

History

History
66 lines (48 loc) · 2.78 KB

README.md

File metadata and controls

66 lines (48 loc) · 2.78 KB

Job Metadata Extraction and Management in Databricks

This repository contains a Python-based solution for automating the extraction and management of job metadata in Databricks. The solution leverages the Databricks API to fetch job details, such as job configurations, execution schedules, and cluster information, and stores them in a centralized Delta table for easy access and monitoring.

Table of Contents

Overview

Managing and monitoring jobs in Databricks can become challenging as the number of jobs grows. This solution automates the extraction of job metadata using the Databricks API and stores it in a centralized Delta table. This centralized repository can be used to create dashboards, monitor job health, and take proactive actions.

Prerequisites

Before using this solution, ensure you have the following:

  1. Databricks Workspace: Access to a Databricks workspace with administrative privileges.
  2. Personal Access Token (PAT): Generate a PAT from your Databricks workspace for API authentication.
  3. Azure Key Vault: Store your PAT securely in Azure Key Vault or any other secret management tool.
  4. Databricks CLI: Optional but helpful for managing Databricks resources.

Setup

  1. Clone the Repository:
    git clone https://github.com/your-username/job-metadata-databricks.git
    cd job-metadata-databricks
    

Set Up Environment:

  1. Ensure you have Python installed.
  2. Install the required Python packages: ``` pip install requests pandas pyspark

Configure Databricks:

  1. Set up your Databricks workspace URL and PAT in the job_metadata_information.ipynb notebook.
  2. Ensure your Databricks cluster has access to the Azure Key Vault where the PAT is stored.

Usage

Run the Notebook:

  1. Open the job_metadata_information.ipynb notebook in your Databricks workspace.
  2. Execute the cells to fetch job metadata and store it in the centralized Delta table.

Monitor Job Metadata:

  1. Use the centralized Delta table to create dashboards and monitor job configurations, execution schedules, and more.

Code Structure

job_metadata_information.ipynb: The main notebook that contains the code for fetching job metadata and storing it in a Delta table. README.md: This file, providing an overview and instructions for using the repository.

Contributing

Contributions are welcome! Please follow these steps to contribute:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature-branch).
  5. Open a pull request.