- Project Overview
- Dependencies
- Data Collection
- Data Preparation
- Data Visualization
- Clustering Techniques
- Results and Insights
- Usage
- Contributing
This project aims to perform customer segmentation using machine learning techniques. By understanding customer behaviors and demographics, businesses can tailor their marketing strategies and enhance customer experience.
The project requires the following Python libraries:
- pandas
- matplotlib
- seaborn
- scipy
- scikit-learn
The dataset used in this project is the 'Mall_Customers.csv' file, which contains information about customers such as:
- CustomerID
- Gender
- Age
- Annual Income (k$)
- Spending Score (1-100)
The data preparation steps included:
- Loading the data into a Pandas DataFrame.
- Performing exploratory data analysis (EDA) to understand the data distribution.
- Checking for and handling any missing values.
- Normalizing the data for clustering.
To gain insights into the customer data, several visualizations were created:
- Distribution of Gender
- Distribution of Age
- Distribution of Annual Income
- Distribution of Spending Score
- Income Distribution by Gender
Two main clustering techniques were applied:
- K-Means Clustering:
- Optimal number of clusters was determined using the Elbow Method.
- Clusters were visualized to interpret the segmentation.
- Hierarchical Clustering:
- Agglomerative Clustering was performed.
- Dendrogram was used to visualize the cluster hierarchy.
The clustering analysis revealed distinct customer segments based on spending behavior and income levels. These insights can be used for targeted marketing and improving customer engagement.
To run the project, follow these steps:
- Clone the repository:
git clone https://github.com/yourusername/customer-segmentation.git
- Navigate to the project directory:
cd customer-segmentation
- Run the Jupyter notebook:
jupyter notebook Customer_Segmentation.ipynb
Contributions are welcome! Please create a new branch for any changes and submit a pull request for review.