Project assignment for Data Mining at University of Limerick
This project revisits some questions related to a secondary school student performance by analysing the data provided in the UCI repository . The data was collected during 2005-2006 school year from two Portuguese schools by Cortez and Silva (2008).
The provided data is used to create different classification models (Decision trees, random forest, support machines, etc...) to predic and clusterring models to visualise the student performance based on the features and also identify key features that influence the student performance.
-
Exploratory data analysis of the dataset.
-
Data preparation:
-
Treat the missing values.
-
Strategy for outliers.
-
Create new features from the existing ones [Optional if necessary].
-
-
Predictive modelling: apply three different ML algorithms to build a predictive model for either classification or numeric prediction which is as accurate as possible.
-
Short conclusion that presents your final predictive model and summarises your findings. Aim at building a predictive model that is as accurate as possible.
-
Apply a clustering algorithm to your dataset and visualise the clustering. Discuss the usefulness of the clustering for better understanding of the underlying patterns in your dataset.