Project Title: NEOCLARA-II: Performance Evaluation of Classification Models
Description: This project focuses on evaluating different classification models for the detection and risk analysis of near-Earth objects (NEOs). It aims to provide insights into model performance using several key evaluation metrics.
In this project, the goal is to evaluate the performance of various classification models that predict the risk of near-Earth objects (NEOs). The project follows a binary classification approach, where each model’s ability to distinguish between two classes is assessed using multiple evaluation metrics.
The following metrics are employed to evaluate model performance:
The confusion matrix is a tabular visualization of the actual vs. predicted classifications. It allows you to see:
- True Positives (TP): Correctly predicted positive instances.
- False Positives (FP): Incorrectly predicted positive instances.
- True Negatives (TN): Correctly predicted negative instances.
- False Negatives (FN): Incorrectly predicted negative instances.
This metric provides a detailed view of where the model makes errors and is fundamental for understanding model behavior.
Recall (also known as sensitivity or true positive rate) measures the model’s ability to correctly identify positive instances. It is particularly important when false negatives carry significant consequences, such as failing to identify a potential near-Earth object.
Formula:
[ \text{Recall} = \frac{TP}{TP + FN} ]
The AUC-ROC curve plots the true positive rate (TPR) against the false positive rate (FPR). AUC provides an aggregate measure of a model’s ability to distinguish between the positive and negative classes. The higher the AUC, the better the model at distinguishing between the two classes.
The PR-AUC is a variation of the AUC-ROC curve and is especially useful in cases of imbalanced classes. It focuses on the trade-off between precision and recall. A higher PR-AUC indicates a better model, particularly when dealing with an imbalanced dataset.
NEOCLARA-II/
│
├── data/ # Dataset used for training and testing models
├── models/ # Various classification models
├── evaluation/ # Scripts for evaluating models using the selected metrics
├── notebooks/ # Jupyter notebooks for exploratory analysis
├── results/ # Folder for storing evaluation results and visualizations
├── README.md # Project documentation (this file)
└── requirements.txt # List of required Python packages
- Data Preprocessing: Prepare and clean your dataset. Ensure it is ready for model training.
- Model Training: Train different classification models (e.g., Logistic Regression, Random Forest, SVM) on the dataset.
- Evaluation: Use the evaluation scripts to assess model performance using the confusion matrix, recall, AUC-ROC, and PR-AUC.
- Results: Visualize and interpret the results to compare the models and identify the best-performing ones.
After running the evaluation scripts, you will find the evaluation results, including confusion matrices, ROC and PR curves, and numeric metrics, in the results/
folder. Each model's performance is compared based on the four metrics mentioned above.
This project is licensed under the MIT License - see the LICENSE file for details.