Design, normalize, and fully implement a relational database that optimizes data integrity, reduces redundancy, and prepares for advanced SQL querying.
This project transitions reconstructed flat-file financial transaction data (Project 1) into a fully normalized PostgreSQL database schema. The focus was on database normalization, schema optimization, and relational table design to ensure efficient querying and storage.
The end goal was to:
- Apply database normalization principles to eliminate redundancy and improve query efficiency.
- Design a well-structured relational schema to support financial tracking across Projects, Campaigns, and Employee Roles.
- Execute schema creation, data migrations, and indexing to optimize SQL querying for future analysis.
The structured budget database consists of six core tables:
- pmo – Stores financial transactions, including payments, contracts, and project details.
- campaigns – Houses campaign metadata, including budget allocations and timelines.
- campaign_mapping – Acts as a bridge table, linking Projects and Campaigns.
- campaign_projects – Maps Projects to Campaigns for financial tracking.
- org_structure – Stores employee assignments and management hierarchy.
- agency_contracts – Manages contractual agreements between agencies and campaigns.
Each table follows normalization best practices, ensuring clear relationships between entities while supporting complex SQL joins and aggregations.
- Designed a fully normalized relational schema, applying 1NF, 2NF, and 3NF principles.
- Optimized entity relationships, reducing redundancy while maintaining financial tracking granularity.
- Created an Entity Relationship Diagram (ERD) to visualize table relationships and normalize dependencies.
- Developed the schema in PostgreSQL, defining primary keys, foreign keys, constraints, and indexes.
- Implemented indexing strategies for performance optimization, including:
- Primary Keys for uniqueness enforcement
- Foreign Keys for referential integrity
- Indexes on frequently queried columns (e.g.,
campaign_id
,project_no
,payment_no
)
- Cleansed and transformed data to align with the normalized schema.
- Migrated data from flat files (
pmo_backup
) into relational tables, applying:- Foreign key lookups to replace denormalized fields
- CASE statements to dynamically update descriptions
- Window Functions to assign unique sequence numbers to transactional records
- Automated batch data inserts, ensuring transaction safety with PostgreSQL’s
BEGIN; COMMIT;
structure.
- Implemented Git versioning to track:
- Schema changes (
sql_migrations/schema_changes/
) - Data migrations (
sql_migrations/data_migrations/
) - Data insertions (
sql_migrations/data_inserts/
)
- Schema changes (
- Refactored the Git repository structure, improving organization and readability:
- ✅ Consolidated schema updates into
schema_changes/
- ✅ Moved data transformation scripts into
data_migrations/
- ✅ Structured temporary SQL exercises into
sql_queries/exercises/
- ✅ Removed unnecessary backups and ensured
.gitignore
prevents large files from being committed
- ✅ Consolidated schema updates into
- Ensured repository cleanup was performed exclusively via Git commands, reinforcing best practices in version control.
- Executed full schema deployment (
20250211_v1.0_create_budget_schema.sql
), successfully migrating 255,000+ records. - Verified database integrity using sanity checks, including:
- Column alignment verification before data insertion
- Duplicate checks on
payment_no
to ensure uniqueness - Foreign key constraints validation to maintain referential integrity
Below is the ERD representing the fully normalized database structure.
portfolio-project2/ │── data/ │ ├── agency_contracts.csv │ ├── org_structure.csv │ │── diagrams/ │ ├── budget_schema.dbml │ ├── erd.budget.png │ │── docs/ │ ├── script_execution_log.md │ │── sql_migrations/ │ ├── data_inserts/ │ ├── data_migrations/ │ ├── rollback_scripts/ │ ├── schema_changes/ │ │── sql_queries/ │ ├── exercises/ │ │── .gitignore │── README.md
✅ Database normalization significantly reduces redundancy, improving query efficiency.
✅ Foreign key constraints enforce referential integrity, preventing orphaned records.
✅ Indexing high-frequency lookup columns enhances performance for complex SQL queries.
✅ Pre-insert sanity checks (e.g., verifying duplicates, ensuring column alignment) prevent migration failures.
✅ Window Functions are a powerful tool for assigning unique identifiers in data migration scenarios.
✅ Git versioning is essential for maintaining schema history and tracking database modifications.
✅ Git repository cleanup and organization is critical for clarity and maintainability.
This project (Project 2A: Database Normalization & Execution) serves as a foundation for the next phase:
🚀 Project 2B: Advanced SQL Querying
- Writing complex SQL queries, including Joins, Common Table Expressions (CTEs), and Window Functions.
- Performance tuning, leveraging EXPLAIN ANALYZE for query optimization.
- Financial reporting and trend analysis, creating queries for budget tracking, spend patterns, and forecasting.
🚀 Project 3A: Python ETL Workflow Conversion
- Rebuilding the database schema via Python, transitioning from manual SQL migrations to Python-based ETL workflows.
- Automating ETL pipelines, using Pandas, SQLAlchemy, and psycopg2 for database integration.
- Implementing Apache Airflow, scheduling incremental data ingestion and workflow orchestration.
- Expanding Git repository automation, improving tagging strategies and CI/CD integration.
This project was an invaluable learning experience in:
- Database design, normalization, and migration
- SQL performance optimization
- Git repository cleanup and best practices in version control
This structured approach ensures Project 2A is a strong foundation for automation in Project 3A.
Looking ahead, the focus shifts from manual SQL workflows to fully automated ETL solutions.
💡 Special Note:
This repository is purposefully structured to showcase my transition from manual SQL-based database management (Project 2A) to fully automated ETL pipelines in Python (Project 3A).
By maintaining a well-documented version history, this project serves as a reference for industry best practices in database engineering, automation, and workflow orchestration. 🚀