This project is designed to develop and evaluate machine learning models for classifying credit scores into distinct categories. Using a well-preprocessed dataset, we aim to build robust models to predict creditworthiness based on financial attributes. The primary focus is on achieving high performance in terms of accuracy and ROC-AUC score, a key metric for assessing the quality of multi-class classification models.
The dataset utilized in this project is the Credit Score Classification Dataset available on Kaggle. This dataset includes various features relevant to credit scoring and has been preprocessed to handle missing values and ensure overall data quality.
We have implemented and evaluated two advanced machine learning models:
- Random Forest Classifier 🌲
- Gradient Boosting Classifier 🚀
-
Hyperparameters Tuned:
n_estimators
: Number of trees in the forestmax_features
: Number of features to consider for splittingmax_depth
: Maximum depth of the treescriterion
: The function to measure the quality of a split (gini
orentropy
)
-
Best Parameters:
n_estimators
: 300max_features
: 'auto'max_depth
: 10criterion
: 'entropy'
-
Hyperparameters Tuned:
n_estimators
: Number of boosting stages to runlearning_rate
: Shrinks the contribution of each treemax_depth
: Maximum depth of the individual trees
-
Best Parameters:
n_estimators
: 150learning_rate
: 0.1max_depth
: 4
Model performance is evaluated using multiple metrics, with a particular emphasis on the ROC-AUC score, which provides insight into the model's ability to distinguish between classes across different thresholds.
-
Random Forest Classifier 🌲:
- Accuracy: 72.98%
- ROC-AUC Score: 0.873
- Confusion Matrix:
[[ 4044 82 1196] [ 678 6030 2097] [ 2229 1824 11820]]
- Classification Report:
precision recall f1-score support 0 0.58 0.76 0.66 5322 1 0.76 0.68 0.72 8805 2 0.78 0.74 0.76 15873 accuracy 0.73 30000 macro avg 0.71 0.73 0.71 30000 weighted avg 0.74 0.73 0.73 30000
-
Gradient Boosting Classifier 🚀:
- Accuracy: 73.17%
- ROC-AUC Score: 0.874
- Confusion Matrix:
[[ 3763 86 1473] [ 493 5982 2330] [ 1764 1902 12207]]
- Classification Report:
precision recall f1-score support 0 0.63 0.71 0.66 5322 1 0.75 0.68 0.71 8805 2 0.76 0.77 0.77 15873 accuracy 0.73 30000 macro avg 0.71 0.72 0.71 30000 weighted avg 0.73 0.73 0.73 30000
The ROC-AUC score reflects the models' ability to distinguish between classes across various thresholds. Higher ROC-AUC scores indicate better model performance.
The ROC curves for each class are plotted to visualize the true positive rate versus the false positive rate. This helps in understanding how well each model performs in distinguishing between different credit score categories.