Skip to content

Learn about Data Engineering ⛏️, Data Pipelines Building πŸͺˆ ,Batch Processing πŸ₯…and Data Streaming 🎏 with PostgreSQL, Docker, dbt , AirFlow ,Airbyte, Spark and Kafka

Notifications You must be signed in to change notification settings

TheODDYSEY/Elt-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


πŸ”„ ELT Project with Docker and PostgreSQL ,Airbyte, dbt, and Airflow

airflow-postresql-docker


Docker PostgreSQL Python

A Custom Extract, Load, Transform (ELT) Process using Docker and PostgreSQL

Effortlessly extract data from a source PostgreSQL database, transform it, and load it into a destination database using Docker containers.

  1. πŸ€– Introduction
  2. 🌲 Branches
  3. βš™οΈ Tech Stack
  4. πŸ”‹ How It Works
  5. 🀸 Getting Started
  6. πŸ”— Links
  7. πŸš€ Conclusion

This ELT project demonstrates a custom Extract, Load, Transform (ELT) process that uses Docker and PostgreSQL. The project includes a source database, a destination database, and an ELT script that facilitates the data transfer. The key components are:

  • Extracts data from a source PostgreSQL database.
  • Transforms data as necessary using a Python script.
  • Loads the transformed data into a destination PostgreSQL database.
  • All processes are orchestrated using Docker.

This project has multiple branches to explore different ELT workflows. Switch to the appropriate branch to try out other implementations:

Branch Description Switch Command
main ELT Project with Docker and PostgreSQL (current branch). git checkout main
airbyte A Dockerized ELT Workflow Using PostgreSQL, dbt, and Airflow. git checkout airbyte
airflow ELT Project with Docker, PostgreSQL, dbt, and Airflow. git checkout airflow
cron ELT Project with Docker, PostgreSQL, dbt, and CRON. git checkout cron
dbt ELT Project with Docker, PostgreSQL, and dbt. git checkout dbt

To switch branches, run the appropriate git checkout command listed above.

  • Docker: Containerization of the entire application stack.
  • PostgreSQL: Both source and destination databases for data storage.
  • Python: ELT scripting language for extracting, transforming, and loading data.
  • Docker Compose: Manages multi-container Docker applications.

Docker Compose

  • The docker-compose.yaml file orchestrates three Docker containers:
    1. Source PostgreSQL Database: Contains sample data.
    2. Destination PostgreSQL Database: Where the data is loaded.
    3. ELT Python Script: The Python script that extracts data from the source, transforms it, and loads it into the destination database.

ELT Process

  • The Python script (elt_script.py) waits for the source PostgreSQL database to become available.
  • Once available, the script uses pg_dump to extract the data.
  • It then uses psql to load the extracted data into the destination PostgreSQL database.

Database Initialization

  • The init.sql script initializes the source database with sample data, including tables for users, films, categories, actors, and film actors.

Follow these steps to set up and run the project locally.

Prerequisites: Make sure you have Docker and Docker Compose installed on your machine.

Clone the Repository: Clone the repository to your local machine:

git clone https://github.com/TheODDYSEY/Elt-Project.git

Navigate to the Directory: Navigate to the project directory:

cd Elt-Project

Start Docker Containers: Run the following command to start the Docker containers:

docker-compose up

Access the Destination Database: Once the containers are up and running, the ELT process will start automatically. After the ELT process completes, you can access the source and destination PostgreSQL databases on ports 5433 and 5434, respectively. Use the following command to access the destination PostgreSQL database:

docker exec -it elt-project-destination_postgres-1 psql -U postgres

View the Database and Tables:

\c destination_db   -- Connects to the destination database named destination_db
\dt                 -- Lists all tables in the current database

Congratulations! You’ve successfully set up and run the ELT project using Docker and PostgreSQL. Explore the other branches to experience more advanced ELT workflows and expand your understanding of modern data processing pipelines.

About

Learn about Data Engineering ⛏️, Data Pipelines Building πŸͺˆ ,Batch Processing πŸ₯…and Data Streaming 🎏 with PostgreSQL, Docker, dbt , AirFlow ,Airbyte, Spark and Kafka

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published