Over a period of nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system.
In order to process this data, need to create machine learning models capable of classifying candidate exoplanets from the raw dataset.
- Steps
- Preprocess the dataset prior to fitting the model.
- Perform feature selection and remove unnecessary features.
- Use
MinMaxScaler
to scale the numerical data. - Separate the data into training and testing data.
- Use
GridSearch
to tune model parameters. - Tune and compare at least two different classifiers.
- Comparison of each model's performance as well as a summary about the findings and assumptions made based on the models
- Best Score: 0.8930001907304979
- Random Forest model's best score of (0.8930001907304979) seems better than SVM model (0.8706847224871257) with hyperparameter tuning when comparing the scores.
- The accuracy of Neural Networks model and Deep Learning model is quite close without hyperparameter tuning and less training. Neural Network model: 0.8758581280708313
Deep Learning model: 0.8747139573097229
- Best Score: 0.8706847224871257
- Best Grid Score: 0.8489414457371733
-
From observation, Random Forest model is better at predicting new exoplanets with confirmed f1- score at 0.81 while SVM model is 0.78 and KNN model is 0.72.
-
All the models are good for predicting FALSE POSITIVE well with f1-score closer to 1 (0.99).
-
Same comparison for Candidate - Random Forest model F1 -score is higher than other models.