- Taking the input data
x
- Making a prediction using
w
andb
- Comparing the prediction to the desired output with function like mse
- Back propagate the model and get the gradients of the
w
andb
- Apply the gradients using learning rate
- Repeat the cycle
-
Every layer is liner layer means the its looks like
$$f(x) = (weight_1 \cdot x_1) + (weight_2 \cdot x_2) + bias$$ or
$$f(x) = weight_1\cdot x^2 + weight_2\cdot x + bias$$ -
in between layers there is non linear function like
sigmoid
function but it needs to be differentiable
- forward pass
- calculate error using function like
mse
that gets the prediction and the ground truth and output calculated error - from the error function derive each layer back
- apply the derivative to the parameters
w
andb
stateDiagram-v2
x1 --> h1: w1
x2 --> h1: w2
h1 --> a1: w5
x2 --> h2: w4
x1 --> h2: w3
h2 --> a1: w6
a1:a1 (+b3)
h1:h1 (+b1)
h2:h2 (+b2)
- we have activision function
sigmoid
on every neuron - at the end we use the loss function:
mse
on every layer there is an activision function:
$$f(x) = sigmoid(x) = \frac{1}{1+e^{-x}}$$
we can write the loss function using the weights:
- mse derivative:
$mse(y_{true}, y_{pred})' = -2(y_{true} - y_{pred})$ - sigmoid derivative:
$f(x)' = (\frac{1}{1+e^{-x}})' = \frac{e^{-x}}{(1+e^{-x})^2} = f(x)\cdot(1-f(x))$
then to derive parameter we will use the chain rule:
we can calculate
so simplifying the derivatives and writing them with parameters: