Applying Unsupervised learning to idetify customer segments
In this project, you will work with real-life data provided to us by our Bertelsmann partners AZ Direct and Arvato Finance Solution. The data here concerns a company that performs mail-order sales in Germany. Their main question of interest is to identify facets of the population that are most likely to be purchasers of their products for a mailout campaign. Your job as a data scientist will be to use unsupervised learning techniques to organize the general population into clusters, then use those clusters to see which of them comprise the main user base for the company. Prior to applying the machine learning methods, you will also need to assess and clean the data in order to convert the data into a usable form.
- NumPy
- pandas
- Sklearn / scikit-learn
- Matplotlib (for data visualization)
- Seaborn (for data visualization)
In this project we have two dataset the first one is general dataset for people of Germany and the second for customers of a mail-order sales company.
- Load The data
- Preprocessing (Data Cleaning & engineering)
- Feature Transformation (feature scaling and perform the PCA)
- Interpret Principal Components
- Clustering
- Compare Customer Data to Demographics Data