Skip to content

kmdawkins/portfolio-project2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project 2A: Database Normalization & Execution

Objective

Design, normalize, and fully implement a relational database that optimizes data integrity, reduces redundancy, and prepares for advanced SQL querying.

Overview

This project transitions reconstructed flat-file financial transaction data (Project 1) into a fully normalized PostgreSQL database schema. The focus was on database normalization, schema optimization, and relational table design to ensure efficient querying and storage.

The end goal was to:

  • Apply database normalization principles to eliminate redundancy and improve query efficiency.
  • Design a well-structured relational schema to support financial tracking across Projects, Campaigns, and Employee Roles.
  • Execute schema creation, data migrations, and indexing to optimize SQL querying for future analysis.

Data Description

The structured budget database consists of six core tables:

  1. pmo – Stores financial transactions, including payments, contracts, and project details.
  2. campaigns – Houses campaign metadata, including budget allocations and timelines.
  3. campaign_mapping – Acts as a bridge table, linking Projects and Campaigns.
  4. campaign_projects – Maps Projects to Campaigns for financial tracking.
  5. org_structure – Stores employee assignments and management hierarchy.
  6. agency_contracts – Manages contractual agreements between agencies and campaigns.

Each table follows normalization best practices, ensuring clear relationships between entities while supporting complex SQL joins and aggregations.


Key Steps & Technologies

Step 1: Database Design & Normalization

  • Designed a fully normalized relational schema, applying 1NF, 2NF, and 3NF principles.
  • Optimized entity relationships, reducing redundancy while maintaining financial tracking granularity.
  • Created an Entity Relationship Diagram (ERD) to visualize table relationships and normalize dependencies.

Step 2: Schema Implementation & Indexing

  • Developed the schema in PostgreSQL, defining primary keys, foreign keys, constraints, and indexes.
  • Implemented indexing strategies for performance optimization, including:
    • Primary Keys for uniqueness enforcement
    • Foreign Keys for referential integrity
    • Indexes on frequently queried columns (e.g., campaign_id, project_no, payment_no)

Step 3: Data Migration & Transformation

  • Cleansed and transformed data to align with the normalized schema.
  • Migrated data from flat files (pmo_backup) into relational tables, applying:
    • Foreign key lookups to replace denormalized fields
    • CASE statements to dynamically update descriptions
    • Window Functions to assign unique sequence numbers to transactional records
  • Automated batch data inserts, ensuring transaction safety with PostgreSQL’s BEGIN; COMMIT; structure.

Step 4: Git Version Control & Repository Cleanup

  • Implemented Git versioning to track:
    • Schema changes (sql_migrations/schema_changes/)
    • Data migrations (sql_migrations/data_migrations/)
    • Data insertions (sql_migrations/data_inserts/)
  • Refactored the Git repository structure, improving organization and readability:
    • ✅ Consolidated schema updates into schema_changes/
    • ✅ Moved data transformation scripts into data_migrations/
    • ✅ Structured temporary SQL exercises into sql_queries/exercises/
    • ✅ Removed unnecessary backups and ensured .gitignore prevents large files from being committed
  • Ensured repository cleanup was performed exclusively via Git commands, reinforcing best practices in version control.

Step 5: Final Database Execution & Testing

  • Executed full schema deployment (20250211_v1.0_create_budget_schema.sql), successfully migrating 255,000+ records.
  • Verified database integrity using sanity checks, including:
    • Column alignment verification before data insertion
    • Duplicate checks on payment_no to ensure uniqueness
    • Foreign key constraints validation to maintain referential integrity

Entity Relationship Diagram (ERD)

Below is the ERD representing the fully normalized database structure.

ERD - Budget Database


Repository Structure

portfolio-project2/ │── data/ │ ├── agency_contracts.csv │ ├── org_structure.csv │ │── diagrams/ │ ├── budget_schema.dbml │ ├── erd.budget.png │ │── docs/ │ ├── script_execution_log.md │ │── sql_migrations/ │ ├── data_inserts/ │ ├── data_migrations/ │ ├── rollback_scripts/ │ ├── schema_changes/ │ │── sql_queries/ │ ├── exercises/ │ │── .gitignore │── README.md


Key Insights & Takeaways

Database normalization significantly reduces redundancy, improving query efficiency.
Foreign key constraints enforce referential integrity, preventing orphaned records.
Indexing high-frequency lookup columns enhances performance for complex SQL queries.
Pre-insert sanity checks (e.g., verifying duplicates, ensuring column alignment) prevent migration failures.
Window Functions are a powerful tool for assigning unique identifiers in data migration scenarios.
Git versioning is essential for maintaining schema history and tracking database modifications.
Git repository cleanup and organization is critical for clarity and maintainability.


Next Steps

This project (Project 2A: Database Normalization & Execution) serves as a foundation for the next phase:

🚀 Project 2B: Advanced SQL Querying

  • Writing complex SQL queries, including Joins, Common Table Expressions (CTEs), and Window Functions.
  • Performance tuning, leveraging EXPLAIN ANALYZE for query optimization.
  • Financial reporting and trend analysis, creating queries for budget tracking, spend patterns, and forecasting.

🚀 Project 3A: Python ETL Workflow Conversion

  • Rebuilding the database schema via Python, transitioning from manual SQL migrations to Python-based ETL workflows.
  • Automating ETL pipelines, using Pandas, SQLAlchemy, and psycopg2 for database integration.
  • Implementing Apache Airflow, scheduling incremental data ingestion and workflow orchestration.
  • Expanding Git repository automation, improving tagging strategies and CI/CD integration.

Final Thoughts

This project was an invaluable learning experience in:

  • Database design, normalization, and migration
  • SQL performance optimization
  • Git repository cleanup and best practices in version control

This structured approach ensures Project 2A is a strong foundation for automation in Project 3A.
Looking ahead, the focus shifts from manual SQL workflows to fully automated ETL solutions.


💡 Special Note:
This repository is purposefully structured to showcase my transition from manual SQL-based database management (Project 2A) to fully automated ETL pipelines in Python (Project 3A).
By maintaining a well-documented version history, this project serves as a reference for industry best practices in database engineering, automation, and workflow orchestration. 🚀