The goal of the algorithm is to find a final hypothesis with low error relative to a given distribution D over the
training examples.
from the original paper By Yoav Freund, Robert E Schapire 1
- Analogy:
- Consult several doctors, based on a combination of weighted diagnoses – weight assigned based on the previous diagnosis accuracy.
- How boosting works:
- Weights are assigned to each training tuple.
- A series of k classifiers is iteratively learned.
- After a classifier Mi is learned, the weights are updated to allow the subsequent classifier, Mi+1 to pay more attention to the training tuples that were misclassified by Mi.
- The final M* combines the votes of each individual classifier, where the weight of each classifier’s vote is a function of its accuracy.
- Classification:
- Each classifier Mi returns its class prediction.
- The bagged classifier M* counts the votes and assigns the class with the most votes to X.
- Boosting algorithm can be extended for numeric prediction.
from the original paper By Yoav Freund, Robert E Schapire 1
reload page if the color scheme doesn't match your theme
![Algorithm-light img](/Daniil-Horobets/AdaBoost/raw/main/assets/Algorithm-light.png)
- Given a data set D of d class-labeled tuples: (x1, y1), ... ,(xd, y
d)
with yd ∈ Y = {1, ... ,c}. - Initialize empty lists to hold information per classifier: w, β, M ← empty list.
- Initialize weights for first classifier to hold same probability for each tuple: wj1 ←
$\LARGE \frac{1}{d}$ - Generate K classifiers in K iterations. At iteration k,
- Calculate “normalized” weights:
$\LARGE \textbf{p}^k = \frac{\textbf{w}^k}{Σ_{j=1}^d w_j^i}$ - Sample dataset with replacement according to pk to form training set Dk.
- Derive classification model Mk from Dk.
- Calculate error εk by using Dk as a test set as follows:
$\LARGE ε_k = Σ_{j=1}^d p_j^k \cdot \text{err}(M_k, x_j, y_j)$ ,$\text{err}(M_k, x_j, y_j)$ returns 1 if Mk(xj)$\neq$ yj, otherwise it returns 0. - If
$\text{error}(M_k)$ > 0.5: Abandon this classifier and go back to step 1. - Calculate
$\LARGE \textbf{β}_k = \frac{ε_k}{1 - ε_k}$ . - Update weights for the next iteration:
$\LARGE w_j^{k+1} = w_j^kβ_k^{1−\text{err}(M_k, x_j, y_j)}$ . - Add wk+1 , Mk , and βk to their respective lists.
- Calculate “normalized” weights:
See the implementation of the Training part in the fit
function. You can view it in adaboost.py
- Initialize weight of each class to zero.
-
For each classifier i in k classifiers:
- Calculate the weight of this classifier’s vote:
$\LARGE w_i = \log (\frac{1}{β_i})$ . - Get class prediction c for (single) tuple x from current weak classifier
$M_i: \quad c = M_i(x)$ . - Add wi to weight for class c.
- Calculate the weight of this classifier’s vote:
- Return predicted class with the largest weight.
- Mathematically, this can be formulated as:
$\LARGE M(x) = \text{argmax}_{y∈Y} Σ_{i=1}^k (\log (\frac{1}{β_i}))M_i(x)$ .
See the implementation of the Prediction part in the predict
function. You can view it in adaboost.py
dataset = pd.read_csv("dataset/car_train.csv")
dataset.head()
This dataset is a slightly modified version of
the car evaluation dataset
from the UCI Machine Learning Repository. Originally, this dataset has four class values. For the sake of this example
dataset modified to binary classification.
Metric | Value |
---|---|
Accuracy | 0.89 |
Recall | 0.75 |
Specificity | 0.94 |
Area Under the Curve (AUC) | 0.85 |
F1 score | 0.80 |
Confusion Matrix:
Receiver Operating Characteristic (ROC) curve:
Values may differ in each run
See implementation and more metrics calculation in main.py
1 A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Under an Elsevier user license
Yoav Freund, Robert E Schapire,
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,
Journal of Computer and System Sciences,
Volume 55, Issue 1,
1997,
Pages 119-139,
ISSN 0022-0000,
https://doi.org/10.1006/jcss.1997.1504.
(https://www.sciencedirect.com/science/article/pii/S002200009791504X)
Code and Theory constitutes a component of the "Knowledge Discovery in Databases" course exercise offered by
Friedrich-Alexander-Universität Erlangen-Nürnberg
Under GNU General Public License v3.0