Skip to content

Latest commit

 

History

History
24 lines (18 loc) · 1.91 KB

README.md

File metadata and controls

24 lines (18 loc) · 1.91 KB

Data Engineering

This README contains links to my data engineering portfolio projects and learning materials.

Projects

AWS YouTube Data Analysis

  • Tools Used: Python, SQL, AWS, Lambda, Athena, S3, IAM, Glue, QuickSight
  • Analyzed YouTube trending video data using AWS services to build a scalable pipeline for data ingestion, ETL, and storage in a centralized data lake. Created QuickSight dashboards highlighting video views by country, category, and region. Workflow included ingestion, preprocessing, cataloging, and analysis.

Real-Time Data Streaming of Random User Data

  • Tools Used: Python, PostgreSQL, Docker, Airflow, Kafka, Spark, Cassandra, Zookeeper
  • Built a robust, scalabale, and fault-tolerant pipeline using a modern tech stack. The pipeline ingests, processes, and stores random user-generated data from an API.

Azure Medallion Architecture Pipeline

  • Tools Used: Python, SQL, Azure, dbt, Databricks
  • Implemented a complete data engineering pipeline using the Medallion Architecture (Bronze, Silver, and Gold layers) within Azure Databricks. It integrates several Azure services and dbt (Data Build Tool) to orchestrate data ingestion, transformation, and storage, ensuring a robust, scalable, and secure solution.

ELT Pipeline

  • Tools Used: Python, SQL, Airflow, Snowflake, dbt
  • Built a simple ELT pipeline using dbt (Data Build Tool) to transform data in Snowflake, with orchestration managed by Apache Airflow. This setup showcases a modern data engineering workflow, essential for handling large-scale data transformations efficiently.

Learning Materials

The Data Engineering Academy

Data Engineering Zoomcamp