diff --git a/CMP3751 Machine Learning/workshops/Week 7/Workshop 6.ipynb b/CMP3751 Machine Learning/workshops/Week 7/Workshop 6.ipynb
new file mode 100644
index 0000000..fcaf163
--- /dev/null
+++ b/CMP3751 Machine Learning/workshops/Week 7/Workshop 6.ipynb
@@ -0,0 +1,239 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " __Machine Learning Workshp 6 : Dimensionality Reduction__.\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " Welecome to our sixth workshop!
\n",
+ "\n",
+ "In this workshop we will investigate the fundamentals of the __Dimensionality Reduction__ and how to apply its concept through Principal Component Analysis."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# The wine dataset\n",
+ "It is a classic and very easy __multi-class__ classification dataset.\n",
+ "It consists of __three (3)__ classes, __178 total instances__ and __13 real, positive__ features:\n",
+ "1. It is commonly associated with the __UCL Machine Learning Repository__.\n",
+ "1. The classes represent __three different types__ of wine.\n",
+ "1. There are __13 different chemical measurements (features)__ describing each wine sample in terms of alcohol content, malic acid concentration, color intensity, etc.\n",
+ "1. The dataset typically consists of __178 samples__, with each sample corresponding to one bottle of wine.\n",
+ "1. Typically, the data in the wine dataset is __preprocessed__ by __standardizing__ the features to have __zero mean__ and __unit variance__. This preprocessing step __ensures__ that each __feature contributes equally__ to the analysis.\n",
+ "\n",
+ "__Return Arguments:__\n",
+ "1. __data__ as ndarray/dataframe of shape $(178,13)$\n",
+ "1. __target__ as ndarray/series of shape $(178,)$\n",
+ "1. __feature_names__ as a list of the names of the dataset columns\n",
+ "\n",
+ "You could load and use the __wine dataset__ as follows:\n",
+ "```python\n",
+ "from sklearn.datasets import load_wine\n",
+ "# Load the Wine dataset\n",
+ "wine = load_wine()\n",
+ "data = wine.data\n",
+ "target = wine.target\n",
+ "feature_names = wine.feature_names\n",
+ "```\n",
+ "\n",
+ "The __data pre-processing / standardization__ is being performed as follows:\n",
+ "```python\n",
+ "# Standardize the data (mean=0, variance=1)\n",
+ "mean = np.mean(data, axis=0)\n",
+ "std_dev = np.std(data, axis=0)\n",
+ "data_standardized = (data - mean) / std_dev\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Principal Component Analysis / PCA\n",
+ "1. It is a class belonging to the __```sklearn.decomposition```__.\n",
+ "1. it performs __linear dimensionality reduction__ using __Singular Value Decomposition / SVD of the data to __project__ it to a __lower dimensional__ space.\n",
+ "
\n",
+ "
\n",
+ "Its __key parameters__ are:\n",
+ "1. __`n_components1:__ This parameter specifies the number of principal components to retain. You can set it to an integer or a float between 0.0-1.0, indicating the __desired variance__ explained.\n",
+ "1. __`whiten`:__ When set to __'True'__. it whitens the data (decorrelates features) during the transformation.\n",
+ "\n",
+ "Its __output arguments__ are:\n",
+ "1. __`components_`:__ This attribute contains the principal components themselves as row vectors.\n",
+ "1. __`explained variance ratio_`:__ This attribute provides the ratio of variance explained by each of the selected components.\n",
+ "\n",
+ "Its __methods__ are:\n",
+ "1. __`fit(X)`:__This method computes the principal components for the input data __`X`__.\n",
+ "1. __`transform(X)`:__ It transforms the input data __`X`__ into the new coordinate system defined by the __principal components__.\n",
+ "1. __`inverse_transform(X_reduced)`:__ This method allows you to transform reduced data back to the original feature space.\n",
+ "\n",
+ "__Useful Hint:__ We can use the __`fit`__ and __`transform`__ methods of PCA simultaneously, when we want to __perform dimensionality reduction__ and __obtain the transformed data__ in __one step__. The __`fit_transform`__ method is available for this purpose, combining the __fitting and transformation steps__ into a __single call__as follows:\n",
+ "```python\n",
+ "from sklearn.decomposition import PCA\n",
+ "\n",
+ "# Create a PCA object with the desired number of components\n",
+ "pca = PCA(n_components=2)\n",
+ "\n",
+ "# Fit and transform the data in one step\n",
+ "X_reduced = pca.fit_transform(X)\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 3D Visualization of the Principal Components\n",
+ "The __`mpl_tookkits.mplot3d`__ module includes classes and functions for creating various types of 3D plots. We import the __`Axes3D`__ class from the __`mpl_tookkits.mplot3d`__ module. This class is used to __create 3D axes__ for three-dimensional plots:\n",
+ "```python\n",
+ "from mpl_toolkits.mplot3d import Axes3D\n",
+ "```\n",
+ "The following line of code is used to create a new figure with size of 8 inches in width and 6 inches in height:\n",
+ "```python\n",
+ "plt.figure(figsize=(8, 6))\n",
+ "```\n",
+ "The __```plt.figure()```__ function is used to __create a new figure__ for the plot.\n",
+ "The __```figsize```__ allows you to set the dimensions of the figure.\n",
+ "\n",
+ "Then, a list called __```colors```__ is created, which contains three color codes. These colors will be used to distinguish data points for different classes in the 3D scatter plot:\n",
+ "```python\n",
+ "colors = ['r', 'g', 'b']\n",
+ "```\n",
+ "\n",
+ "Finally, the __```wine.target_names```__ variable contains the class labels or names for the data points in the wine dataset. It __represents__ the different types of wine. This line assigns these class names to the __```targets```__ variable:\n",
+ "```python\n",
+ "targets = wine.target_names\n",
+ "```\n",
+ "The __```plt.figure()```__ function creates a new, empty figure which is the basis for our visualization.\n",
+ "The __```ax```__ is a variable that represents the subplot(axis) you are adding to the figure. It's often referred to as the __`axes`__ object. This is where you will create and customize your 3D plot.\n",
+ "\n",
+ "The __```ax = fig.add_subplot(111, projection='3d')```__ is used to add the 3D subplot:\n",
+ "1. The __```111```__ part specifies that you are adding a subplot in a grid with one row, one column and it is the first (and only) subplot.\n",
+ "1. The __```projection='3d' ```__ part tells Matplotlib that the subplot should be in 3D projection, which is necessary for creating 3D plots. The __`projection`__ parameter is set to __`3D`__ to enable 3D plotting capabilities. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# How to create a 3D scatter plot for visualizing the principal components\n",
+ "1. Firstly, we initiate a __`for loop`__ that iterates over pairs of __`target`__ and __`color`__ obtained by __zipping__ two __lists__: __`targets`__ and __`colors`__. The `targets`__ list contains the target class names and __`colors`__ list contains the corresponding colors for the scatter plot.\n",
+ "```python\n",
+ "for target, color in zip(targets, colors):\n",
+ "```\n",
+ "1. The line __```indices_to_keep = reduced_df['Target'] == wine.target_names.tolist().index(target)```__ creates a boolean mask __```indices_to_keep```__ that is __`True`__ for rows in the DataFrame __`reduced_df`__, where the __`Target`__ column (the original class labels) matches the index of the current __`target`__ class in the list of class names __`wine.target_names`__. It helps filter the data points belonging to the current target class.\n",
+ "1. The following lines generate a scatter plot for the data points belonging to the current __`target`__ class. It uses the 3D axes __`ax`__ for plotting: \n",
+ "```python\n",
+ "ax.scatter(reduced_df.loc[indices_to_keep, 'PC1'],\n",
+ " reduced_df.loc[indices_to_keep, 'PC2'],\n",
+ " reduced_df.loc[indices_to_keep, 'PC3'],\n",
+ " c=color,\n",
+ " label=target)\n",
+ "```\n",
+ "1. The code segment \n",
+ "```python\n",
+ "reduced_df.loc[indices_to_keep, 'PC1'],\n",
+ " reduced_df.loc[indices_to_keep, 'PC2'],\n",
+ " reduced_df.loc[indices_to_keep, 'PC3'],\n",
+ "```\n",
+ "select the data points (PC1, PC2, and PC3) for the current target class using the boolean mask __`indices_to_keep`__.\n",
+ "1. The __`c=color`__ specifies the color of the points.\n",
+ "1. The __`label=target`__ provides a label for the legend."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Exercise 1\n",
+ "In this exercise you will use the code and knowledge from the aforementioned cells to perform dimensionality reduction in the Wine Dataset. More specifically, you will perform the following steps:\n",
+ "1. Load the Wine dataset and differentiate the features and the output in the variables __`data`__ and __`target`__ respectively.\n",
+ "1. Then standardize the data so as to have zero mean value and unit variance.\n",
+ "1. Declare an object of the PCA class from the sklearn.decomposition. This model will be initialized so as to retain 3 components.\n",
+ "1. Then use the __`fit_transform()`__ function to fit and transform the data to a new __`reduced_data`__ function.\n",
+ "1. Then create a __`reduced_df`__ DataFrame to visualize the reduced data. \n",
+ "1. Once you created the DataFrame, you should add now the target variable to the DataFrame.\n",
+ "1. Then, plot this DataFrame in 3D (You may use the previous code for this).\n",
+ "1. Estimate the explained variance ratio by using the __`explained_variance_ratio`__ attribute.\n",
+ "1. Print the explained variance ratio for each component and the total variance explained."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Place your code here"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# A goodbye Exercise\n",
+ "This is an __optional__ exercise and a way to thank you for your collaboration this year.\n",
+ "Although it is a __strange gift__, it serves to integrate the knowledge taught in a single exercise/project.\n",
+ "Hopefully, it would be a crash test for the succession of the lessons and workshops.\n",
+ "More specifically:\n",
+ "1. Load the Breast Cancer dataset from sklearn.datasets.\n",
+ "1. Split the dataset into training and testing sets\n",
+ "1. Train and evaluate k-NN and SVM models on the original data\n",
+ "1. Define a function to perform PCA and evaluate models\n",
+ "1. Evaluate and return results (accuracy, precision, recall, F1)\n",
+ "1. Compare the results before and after PCA\n",
+ "1. Display the results"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# References\n",
+ "[UV Irvine Machine Learning Repository](https://archive.ics.uci.edu/)\n",
+ "
\n",
+ "[PCA documentation](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}