This repository showcases an end-to-end process for building an Azure Database, performing data analysis, and visualizing insights. It includes SQL scripts for database, architecture design
(Entity-Relationship Diagrams) using Lucidchart tool, table creation, initial data inserts, and deployment setup for the taxdatabase on Azure SQL. Additionally, it features data visualization and dashboard creation in
.
The project defines a relational database schema (Tax- Expense Schema) to efficiently manage tax-related expenses. Expenses are categorized into dimensions such as courses, conferences, and travel, with calculated columns for automated data handling. The goal is to extract actionable insights that help optimize budgets, track financial trends, and support strategic decision-making.
The focus was on developing a Custom Query & Analysis Tool using frameworks: Flask or Streamlit for dynamic querying of tax and expense data.
In the future, Predictive Analytics & Insights using machine learning (scikit-learn) will be implemented to forecast tax-deductible expenses and detect anomalies in expense patterns.
- SQL Server on Azure: Database hosting and management.
- Azure Data Studio: Query development and data exploration.
- Tableau: Visualization of insights and reporting.
- Notepad++: Writing and editing SQL code.
- Lucidchart: Designing the database schema and entity relationships.
- Python frameworks: Flask or Streamlit, for dynamic querying and forecasting.
As a data scientist, I applied the following skills:
- Data Modeling: Designing a relational database for structured storage of expense data.
- Data Development form scratch.
- Database Management.
- Data Wrangling: Writing SQL and Python queries to clean, transform, and prepare data for analysis.
- Data Retrieval + Analysis: Extracting meaningful patterns and trends from large datasets.
- Reporting and Visualization: Presenting results with dynamic dashboards.
- Designed Entity-Relationship Diagram (ERD), defining tables, establishing relationships, and normalizing data to ensure consistency, accuracy, and scalability.
- Database Setup:
- Designed a database schema to manage expenses with tables for travels, conferences, meetings, and more.
- Deployed the database on SQL Server (Azure).
- Data Exploration:
- Populated the database with sample data for business travel and related expenses.
- Wrote SQL queries to calculate total expenses, tips, and other key metrics.
- Data Retrieval and Analysis:
- Performed cost analysis based on event type, location, and year.
- Identified high-cost trends and potential areas for savings.
- Data Visualization:
- Migrated the data to Tableau to create dashboards with insights such as:
- Year-over-year expense comparisons
- Breakdown of costs by location and event type.
- Trends in business vs. personal travel costs.
🔴 SCAN HERE THE QR CODE TO VIEW THE DATABASE ARCHITECTURE
The database was implemented from scratch using SQL scripts that were initially written and tested in Notepad++.
This approach allowed me to:
➡️ Ensure Data Integrity ➡️ Facilitate Code Versioning and Edits ➡️ Prevent Data Loss
Below is an example of the SQL code I wrote, showing how I structured and populated the database.
CREATE TABLE dbo.DimTravel (
TravelID INT IDENTITY(1,1) PRIMARY KEY,
TaxYear INT NOT NULL,
TravelCity NVARCHAR(100) NOT NULL,
TravelCountry NVARCHAR(5) NOT NULL,
TravelYear INT NOT NULL,
TravelMonth NVARCHAR(50) NOT NULL CHECK (TravelMonth IN (
'January', 'February', 'March', 'April', 'May', 'June',
'July', 'August','September', 'Oktober' 'November', 'December'
)),
TotalExpenseEUR DECIMAL(10,2) AS (HotelExpense + ISNULL(HotelTip, 0.00)) PERSISTED
);
SELECT
TravelMonth,
SUM(TotalExpenseEUR) AS TotalMonthlyExpenses
FROM dbo.DimTravel
GROUP BY TravelMonth
ORDER BY TotalMonthlyExpenses DESC;
SELECT
te.EventCity AS Destination, -- City of travel (basic travel information)
te.TaxYear AS TaxYear, -- Tax year associated with the travel expenses
te.ExpenseAmountEUR AS TravelExpenseAmountEUR, -- Main travel cost in EUR (e.g., hotel)
SUM(COALESCE(ate.ExpenseAmountEUR, 0)) AS TotalAdditionalExpenseAmountEUR, -- Total additional costs associated with the trip
te.ExpenseAmountEUR + SUM(COALESCE(ate.ExpenseAmountEUR, 0)) AS TotalTravelExpenses, -- Total travel costs (main + additional)
COALESCE(ce.ConferenceName, 'N/A') AS ConferenceName -- Name of the associated conference or "N/A" (if no association exists)
FROM
dbo.DimTravelExpenses AS te -- Main table containing travel information
LEFT JOIN
dbo.DimAdditionalTravelExpenses AS ate -- Join with the table of additional travel costs
ON
te.TravelID = ate.TravelID -- Linking additional expenses to the main trip using TravelID
LEFT JOIN
dbo.DimTravelPurposeMapping AS tp -- Join with the table of travel purposes
ON
te.TravelID = tp.TravelID AND tp.PurposeType = 'Conference' -- Filtering travel purpose as "Conference"
LEFT JOIN
dbo.DimConferencesExpenses AS ce -- Join with the conferences table
ON
tp.ConferenceID = ce.ConferenceID -- Linking the conference by its ID
GROUP BY
te.EventCity, -- Grouping results by the city of travel
te.TaxYear, -- Grouping results by the tax year
te.HotelName, -- Grouping by hotel name (if available)
te.ExpenseAmountEUR, -- Grouping by main travel expenses
ce.ConferenceName; -- Grouping by conference name (if available)
- Multiple dimensional tables (e.g.,
DimCourses
,DimConferences
,DimTravel
). - Calculated fields for total expenses, tips, and travel distance.
- Foreign key constraints to ensure referential integrity.
- Cascading deletes for dependent data cleanup.
After carefully writing and refining the SQL scripts in Notepad++, I deployed the entire database on Microsoft Azure and managed it using Azure Data Studio.
The process involved:
➡️ Migrating SQL Code ➡️ Populating the Database
Once the database was set up and populated, I wrote SQL queries to extract meaningful insights from the data.
This included: Retrieving Data for Analysis:
- Calculating total travel expenses (including additional costs).
- Generating reports on conference and meeting costs.
- Analyzing expenses across years, locations, and event types.
Below, you can see me performing live data analysis, showcasing my workflow and technical skills in action. The recording highlights:
- Real-Time Analysis & Coding
- Demonstratin of: how I approach tasks, troubleshoot and refine queries.
➡️ To open in a new tab: Right-click the link and select "Open link in new tab" 🆕
The final step of the project was to analyze the data and present it visually. To achieve this, I migrated the database from the Microsoft Azure server to Tableau. This required preparing the data source by connecting to the database, selecting the necessary tables, and configuring joins to define the logical relationships between the data entities. I ensured the joins were optimized to maintain query performance and avoid redundant data retrieval.
Additionally, I implemented calculated fields and filters within Tableau to refine the data for specific analyses and visualizations. This allowed for the creation of dynamic dashboards and detailed visual representations that effectively showcased the results of the entire analysis. The combination of technical precision and visual clarity ensured the insights were both actionable and easy to interpret for stakeholders.
The final stage results are showcased below in GIFs for a clearer view of the project.
🔴 Setting Logical Relationships in Tableau ⬇️
🔴 Below is an example of the Tableau dashboard showcasing:
Total expenses by year and event type. Interactive filters for country, event type, and date.
Integrating Python with SQL streamlined data processing and enabled dynamic querying, allowing users to interact with and analyze expense data in real time. This automation not only improved efficiency but also laid the foundation for future enhancements, such as predictive cost analysis and anomaly detection.
from flask import Flask, request, jsonify
import pyodbc
import pandas as pd
app = Flask(__name__)
# Database connection setup
DB_CONNECTION_STRING = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=<your_server>;DATABASE=taxdatabase;UID=sqladmin;PWD=<your_password>"
pip install flask flask-cors pyodbc pandas
Basic Flask App (API for Querying Expenses) here
Pull requests are welcome. For major changes, please open an issue first to discuss your ideas.
This project is licensed under the MIT License. See the LICENSE file for more details.
- Clone the repository.
- Open the SQL script Taxdatabase Script in your preferred SQL environment, such as:
- Azure Data Studio
- SQL Server Management Studio (SSMS)
- Execute the scripts to:
- Create the database schema: Set up all tables, constraints, and relationships.
- Populate the tables with sample data (you can use: Taxdatabase Seed Data Example)
- Analyze the data:
- Use the provided SQL queries (above: "Sample Query 2") or write your own to:
- Retrieve insights (e.g., travel expenses by month).
- Perform advanced analysis for reporting purposes.
- Connect to SQL using Python Establish Connection SQL-PY
Clone this repository:
git clone https://github.com/ninryt/taxdatabase-sql-tableau.git