Classification using Machine Learning Techniques on Breast Cancer Wisconsin Data Set (diagnosis) Using Support Vector Machines with the accuracy of 0.96
This Project use Wisconsin Data Set https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic). which consist
Data Set Characteristics: Multivariate
Number of Instances:569
Area:Life
Attribute Characteristics:Real
Number of Attributes:32
Associated Tasks:Classification
Number of Web Hits: 9752918
- ID number
- Diagnosis (M = malignant, B = benign)
Ten real-valued features are computed for each cell nucleus:
- radius (mean of distances from center to points on the perimeter)
- texture (standard deviation of gray-scale values)
- perimeter
- area
- smoothness (local variation in radius lengths)
- compactness (perimeter^2 / area - 1.0)
- concavity (severity of concave portions of the contour)
- concave points (number of concave portions of the contour)
- symmetry
- fractal dimension ("coastline approximation" - 1)
The objective of this model is to find whether the cancer is malignat or benign
- python
- Tableau(Optional)
- google colab/jupyter notebook
- Data manipulation using:
- pandas,
- plotting using seaborn and matplotlib,
- Feature selection
- Grid search
Once I employed all these methods, we were able to utilize various machine learning metrics. Each model provided valuable insight.
The best result is given by svm with hyperparameter using GridSearchCV
method
C: 20
gamma': 0.2
kernel: 'rbf'
precision | recall | f1-score | support | |
---|---|---|---|---|
0.0 | 0.90 | 1.00 | 0.95 | 43 |
1.0 | 1.00 | 0.93 | 0.96 | 71 |
accuracy | 0.96 | 114 | ||
macro avg | 0.95 | 0.96 | 0.95 | 114 |
weighted avg | 0.96 | 0.96 | 0.96 | 114 |