Skip to content

This project is a solution for data extraction, transformation, and loading (ELT) using Airflow, Meltano, Streamlit, and PostgreSQL. It allows extracting data from different sources, loading it into a PostgreSQL database, and visualizing the results in a Streamlit application.

Notifications You must be signed in to change notification settings

Robso-creator/elt_meltano_ind

Repository files navigation

ELT with Airflow, Meltano, Streamlit, and PostgreSQL

Não fala Inglês? Clique aqui to view para ver essa página em Português

pre-commit makefile docker postgresql

This project is a solution for data extraction, transformation, and loading (ELT) using Airflow, Meltano, Streamlit, and PostgreSQL. It allows extracting data from different sources, loading it into a PostgreSQL database, and visualizing the results in a Streamlit application.

It includes the Pre-Commit framework to manage and maintain pre-commit hooks, ensuring code that follows standards established by the Python community.


Table of Contents


Prerequisites

  • Docker
  • Docker Compose
  • Make (optional but recommended)

Project Architecture

img

Why was JSONL chosen?

JSONL (JSON Lines) was selected as the data storage format for this project because it is more flexible and works well in modern pipelines. Here are the main reasons:

  1. Simple Structure: It allows storing complex and nested data, such as lists and objects, without "flattening" the data.
  2. Stream Processing: Each line is an independent JSON object, making it suitable for processing large datasets without loading everything into memory.
  3. Compatibility: It is widely supported by modern ETL tools and APIs, making integration easier.
  4. Easy Debugging: An error in one line does not affect the entire file, making it easier to identify and fix issues.

Setup

  1. Clone the repository:

    git@github.com:Robso-creator/elt_meltano_ind.git
    cd elt_meltano_ind
  2. Create the .env file in the project root with the following content:

    POSTGRES_USER=postgres
    POSTGRES_PASSWORD=postgres
  3. Check the Docker version to ensure it is installed correctly:

    docker --version
  4. Build the Streamlit image and start the containers:

    make build  # Build the Streamlit image
    make up     # Start the containers

Accessing Services

  • Airflow: Access localhost:8080 to manage and execute the DAGs that extract data from sources and load it into the database.

    • In DAGS, enable the dag_elt_meltano_pipeline DAG and wait for execution.

    • If you want to run it for a specific date, click the play button on the right side of the DAG screen. img.png

    • Then choose the date you want to run and click ´Trigger´. img.png

    • If any of the extraction tasks fail, the loading task will not execute. img.png

  • Streamlit: Access localhost:8501 to view the Streamlit application with the processed results. img.png

Running Meltano in the Terminal

To run Meltano directly in the terminal, use the following commands:

make enter-local
SOURCE=postgres YEAR=2025 MONTH=01 DAY=03 meltano run extract-postgres-to-jsonl
SOURCE=csv YEAR=2025 MONTH=01 DAY=03 meltano run extract-csv-to-jsonl
YEAR=2025 MONTH=01 DAY=03 meltano run load-jsonl-to-postgres

Stopping Services

To stop the containers, use the following command:

make down   # Stop the containers
make rm     # Remove stopped containers and volumes

Troubleshooting

If you cannot access the Airflow page and find the error Already running on PID <PID> in make logs-webserver, follow the steps below:

  1. Stop the containers:

    make down
  2. Check if any process is using port 8080:

    sudo lsof -i tcp:8080
  3. If there is a process, kill it:

    sudo kill -9 PID
  4. Remove the Airflow PID file:

    sudo rm -rf meltano/orchestrate/airflow-webserver.pid
  5. Restart the containers:

    make up

Contribution

Contributions are welcome! Feel free to open issues and pull requests.

About

This project is a solution for data extraction, transformation, and loading (ELT) using Airflow, Meltano, Streamlit, and PostgreSQL. It allows extracting data from different sources, loading it into a PostgreSQL database, and visualizing the results in a Streamlit application.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published