Effortlessly extract data from a source PostgreSQL database, transform it, and load it into a destination database using Docker containers.
π Table of Contents
- π€ Introduction
- π² Branches
- βοΈ Tech Stack
- π How It Works
- π€Έ Getting Started
- π Links
- π Conclusion
This ELT project demonstrates a custom Extract, Load, Transform (ELT) process that uses Docker and PostgreSQL. The project includes a source database, a destination database, and an ELT script that facilitates the data transfer. The key components are:
- Extracts data from a source PostgreSQL database.
- Transforms data as necessary using a Python script.
- Loads the transformed data into a destination PostgreSQL database.
- All processes are orchestrated using Docker.
This project has multiple branches to explore different ELT workflows. Switch to the appropriate branch to try out other implementations:
Branch | Description | Switch Command |
---|---|---|
main |
ELT Project with Docker and PostgreSQL (current branch). | git checkout main |
airbyte |
A Dockerized ELT Workflow Using PostgreSQL, dbt, and Airflow. | git checkout airbyte |
airflow |
ELT Project with Docker, PostgreSQL, dbt, and Airflow. | git checkout airflow |
cron |
ELT Project with Docker, PostgreSQL, dbt, and CRON. | git checkout cron |
dbt |
ELT Project with Docker, PostgreSQL, and dbt. | git checkout dbt |
To switch branches, run the appropriate git checkout
command listed above.
- Docker: Containerization of the entire application stack.
- PostgreSQL: Both source and destination databases for data storage.
- Python: ELT scripting language for extracting, transforming, and loading data.
- Docker Compose: Manages multi-container Docker applications.
- The
docker-compose.yaml
file orchestrates three Docker containers:- Source PostgreSQL Database: Contains sample data.
- Destination PostgreSQL Database: Where the data is loaded.
- ELT Python Script: The Python script that extracts data from the source, transforms it, and loads it into the destination database.
- The Python script (
elt_script.py
) waits for the source PostgreSQL database to become available. - Once available, the script uses
pg_dump
to extract the data. - It then uses
psql
to load the extracted data into the destination PostgreSQL database.
- The
init.sql
script initializes the source database with sample data, including tables for users, films, categories, actors, and film actors.
Follow these steps to set up and run the project locally.
Prerequisites: Make sure you have Docker and Docker Compose installed on your machine.
Clone the Repository: Clone the repository to your local machine:
git clone https://github.com/TheODDYSEY/Elt-Project.git
Navigate to the Directory: Navigate to the project directory:
cd Elt-Project
Start Docker Containers: Run the following command to start the Docker containers:
docker-compose up
Access the Destination Database:
Once the containers are up and running, the ELT process will start automatically. After the ELT process completes, you can access the source and destination PostgreSQL databases on ports 5433
and 5434
, respectively. Use the following command to access the destination PostgreSQL database:
docker exec -it elt-project-destination_postgres-1 psql -U postgres
View the Database and Tables:
\c destination_db -- Connects to the destination database named destination_db
\dt -- Lists all tables in the current database
Congratulations! Youβve successfully set up and run the ELT project using Docker and PostgreSQL. Explore the other branches to experience more advanced ELT workflows and expand your understanding of modern data processing pipelines.