Skip to content

Simple Example of AdaBoost Implementation using Python

License

Notifications You must be signed in to change notification settings

Daniil-Horobets/AdaBoost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdaBoost ("Adaptive Boosting" 1)

Description

The goal of the algorithm is to find a final hypothesis with low error relative to a given distribution D over the training examples.
from the original paper By Yoav Freund, Robert E Schapire 1

Boosting

  • Analogy:
    • Consult several doctors, based on a combination of weighted diagnoses – weight assigned based on the previous diagnosis accuracy.
  • How boosting works:
    • Weights are assigned to each training tuple.
    • A series of k classifiers is iteratively learned.
    • After a classifier Mi is learned, the weights are updated to allow the subsequent classifier, Mi+1 to pay more attention to the training tuples that were misclassified by Mi.
    • The final M* combines the votes of each individual classifier, where the weight of each classifier’s vote is a function of its accuracy.
  • Classification:
    • Each classifier Mi returns its class prediction.
    • The bagged classifier M* counts the votes and assigns the class with the most votes to X.
  • Boosting algorithm can be extended for numeric prediction.

AdaBoost.M1 Algorithm

from the original paper By Yoav Freund, Robert E Schapire 1

reload page if the color scheme doesn't match your theme

Algorithm-light img

Simplified Step by Step Interpretation

Training:

  • Given a data set D of d class-labeled tuples: (x1, y1), ... ,(xd, y d)
    with yd ∈ Y = {1, ... ,c}.
  • Initialize empty lists to hold information per classifier: w, β, M ← empty list.
  • Initialize weights for first classifier to hold same probability for each tuple: wj1$\LARGE \frac{1}{d}$
  • Generate K classifiers in K iterations. At iteration k,
    1. Calculate “normalized” weights:
      $\LARGE \textbf{p}^k = \frac{\textbf{w}^k}{Σ_{j=1}^d w_j^i}$
    2. Sample dataset with replacement according to pk to form training set Dk.
    3. Derive classification model Mk from Dk.
    4. Calculate error εk by using Dk as a test set as follows:
      $\LARGE ε_k = Σ_{j=1}^d p_j^k \cdot \text{err}(M_k, x_j, y_j)$,
      where the misclassification error $\text{err}(M_k, x_j, y_j)$ returns 1 if Mk(xj) $\neq$ yj, otherwise it returns 0.
    5. If $\text{error}(M_k)$ > 0.5: Abandon this classifier and go back to step 1.
    6. Calculate
      $\LARGE \textbf{β}_k = \frac{ε_k}{1 - ε_k}$.
    7. Update weights for the next iteration:
      $\LARGE w_j^{k+1} = w_j^kβ_k^{1−\text{err}(M_k, x_j, y_j)}$.
      If a tuple is misclassified, its weight remains the same, otherwise it is decreased. Misclassified tuple weights are increased relatively.
    8. Add wk+1 , Mk , and βk to their respective lists.

See the implementation of the Training part in the fit function. You can view it in adaboost.py

Prediction:

  • Initialize weight of each class to zero.
  • For each classifier i in k classifiers:
    1. Calculate the weight of this classifier’s vote:
      $\LARGE w_i = \log (\frac{1}{β_i})$.
    2. Get class prediction c for (single) tuple x from current weak classifier $M_i: \quad c = M_i(x)$.
    3. Add wi to weight for class c.
  • Return predicted class with the largest weight.
  • Mathematically, this can be formulated as:
    $\LARGE M(x) = \text{argmax}_{y∈Y} Σ_{i=1}^k (\log (\frac{1}{β_i}))M_i(x)$.

See the implementation of the Prediction part in the predict function. You can view it in adaboost.py

Dataset

dataset = pd.read_csv("dataset/car_train.csv")
dataset.head()

Image


This dataset is a slightly modified version of the car evaluation dataset from the UCI Machine Learning Repository. Originally, this dataset has four class values. For the sake of this example dataset modified to binary classification.

Output Metrics

Metric Value
Accuracy 0.89
Recall 0.75
Specificity 0.94
Area Under the Curve (AUC) 0.85
F1 score 0.80

Confusion Matrix:

Image

Receiver Operating Characteristic (ROC) curve:

Image

Values may differ in each run
See implementation and more metrics calculation in main.py

Bibliography

1 A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Under an Elsevier user license

Yoav Freund, Robert E Schapire,
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,
Journal of Computer and System Sciences,
Volume 55, Issue 1,
1997,
Pages 119-139,
ISSN 0022-0000,
https://doi.org/10.1006/jcss.1997.1504.
(https://www.sciencedirect.com/science/article/pii/S002200009791504X)

References

Code and Theory constitutes a component of the "Knowledge Discovery in Databases" course exercise offered by Friedrich-Alexander-Universität Erlangen-Nürnberg
Under GNU General Public License v3.0