Welcome to my GitHub profile! I'm Shubham Goyal, a passionate Data Analyst with expertise in Power BI, Excel, Python, SQL, NoSQL, Machine Learning, Deep Learning. I love turning raw data into meaningful insights and helping businesses make data-driven decisions.
- Power Bi/Tableau
- Excel
- SQL
- NoSQL
- Python
- Time Series Analysis
- Hypothesis Testing
- Machine Learning
- Deep Learning
- Power BI/Tableau: Creating interactive and insightful dashboards and reports.
- Excel: Advanced data analysis, pivot tables, VLOOKUP,Power Pivot.
- SQL(MYSQL): Writing complex queries for data manipulation and retrieval, database management, and performance tuning.
- Python: Data cleaning and manipulation, data visualization, EDA, scripting for data processing, analysis and prediction.
- Jupyter Notebook/Google Colab: Interactive coding environments for data analysis, visualization, and machine learning experiments.
- MongoDB Compass: NoSQL database for handling large-scale unstructured data, performing aggregation, and managing document-based storage.
- Visual Studio Code: Integrated Development Environment (IDE) for coding, debugging, and building applications in multiple languages like Python
Here are some of the projects I've worked on:
- Tools Used: NLP, Machine Learning (SVC, Logistic Regression, MultinomialNB)
- Description: Built a spam detection model to classify messages as ham or spam. Tested multiple classifiers, with SVC achieving 98% accuracy and a 1.0 precision score. Implemented preprocessing techniques to enhance text classification performance.
- Repository: [Spam Detective]
- Tools Used: NLP, Logistic Regression, MultinomialNB, Random Forest
- Description: Built a text classification model to predict whether a user would recommend a product based on review text. Achieved 89% accuracy using Logistic Regression. Extracted and visualized key themes from customer reviews to gain insights into user preferences.
- Repository: [Fashion Kart Analysis]
- Tools Used: NLP, Logistic Regression, MultinomialNB, Random Forest, SVC
- Description: Developed a sentiment classification model to analyze customer perceptions of Virgin Airlines on social media. Logistic Regression achieved the best accuracy of 84%. Created a word frequency chart to visualize commonly used words in positive and negative tweets.
- Repository: [Sentiment Analysis]
- Tools Used: Time Series Analysis, MA, ARMA, ARIMA, SARIMA, Smoothing Models
- Description: Analyzed Netflix stock prices (2018β2022) using various time series models to predict future trends. Evaluated multiple models, with Double Exponential Smoothing achieving the lowest RMSE of 9.81. Provided insights into stock price patterns and volatility.
- Repository: [Time Series Analysis]
- Tools Used: Hypothesis Testing, Confidence Interval Estimation, ANOVA, t-Test, Multiple Regression
- Description:
- Confidence Interval Estimation: Constructed 90%, 95%, and 99% confidence intervals for average dental claim reimbursements.
- Multiple Regression Analysis: Evaluated the impact of price per square foot, bathrooms, and floor size on housing prices.
- ANOVA Analysis: Compared tensile strengths of rubber seals across six machines and sorption rates of solvents.
- t-Test for Smoking & Heart Attack Age: Conducted a two-sample t-test to analyze heart attack age differences between smokers and non-smokers.
- Repository: [Hypothesis Testing]
- Tools Used: Hypothesis Testing, Statistical Analysis
- Description: Compared Facebook and AdWords ad campaigns based on clicks, conversions, and cost-effectiveness to optimize ROI. Identified a stable relationship between ad spend and conversions while highlighting the influence of external factors such as market conditions and audience behavior.
- Repository: [Hypothesis Testing]
- Tools Used: Association Rule Mining, Market Basket Analysis
- Description: Analyzed transaction data to identify frequently bought-together items and optimize product placement. Discovered key product combinations (e.g., rolls/buns & whole milk, yogurt & whole milk, soda & vegetables) to enhance visibility and sales. Identified customer purchase patterns by day, aiding in inventory and marketing strategies.
- Repository: [Market Basket Analysis]
- Tools Used: Power BI, SQL
- Description: Developed a comprehensive dashboard to track sales performance, visualize key metrics, and identify trends.
- Repository: Sales Performance Dashboard
- Tools Used: Excel,Power Bi
- Description: Performed customer segmentation using clustering techniques to enhance marketing strategies.
- Repository: Hotel Analysis
-
Tools Used: SQL, Excel
-
Description: IPL Insights: An interactive Power BI dashboard analyzing IPL statistics and trends.
-
Repository: IPL Insights
-
Tools Used: Excel-power pivot,pivot table
-
Description: Comprehensive Excel report analyzing sales performance, trends, and key metrics.
-
Repository: Sales Analysis
-
Tools Used: Excel-power pivot,pivot table
-
Description:Interactive Power BI dashboard providing insights into pizza sales performance and trends.
-
Repository: Pizza Sales Analysis
-
Tools Used: SQL
-
Description: SQL-based reports providing detailed insights into hardware sales.
-
Repository: AtliQ Hardware Reports
-
Tools Used: Excel-power pivot,pivot table
-
Description: Comprehensive Power BI dashboard offering marketing, finance, supply chain, and executive views.
-
Repository: Business 360 Analysis
"Data is the new oil. Itβs valuable, but if unrefined it cannot really be used." - Clive Humby
Thank you for visiting my GitHub profile. Feel free to explore my repositories and reach out if you have any questions or collaboration ideas!