Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Introduction
Setup
ANN
Working process ANN
Propagation
Bias parameter
Activation function
Loss function
Overfitting and Underfitting
Optimization function
Chain rule
Minima
Gradient problem
Weight initialization
Dropout
ANN Regression Exercise
ANN Classification Exercise
Hyper parameter tuning
CNN
CNN basics
Convolution
Padding
Pooling
Data argumentation
Flattening
Create Custom Dataset
Binary Classification Exercise
Multiclass Classification Exercise
Transfer learning
Transfer model Basic template
RNN
How RNN works
LSTM
Bidirectional RNN
Sequence to sequence
Attention model
Transformer model
Bag of words
Tokenization & Stop words
Stemming & Lemmatization
TF-IDF
N-Gram
Word embedding
Normalization
Pos tagging
Parser
semantic analysis
Regular expression
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
Chain rule means what are the steps that happen in ANN and how those steps are happen in ANN.
Suppose we have four features X1, X2, X3, X4, two hidden layers, and one output layer. In the first hidden
layer, we have three neurons, and second hidden layer we have two neurons. We know that each node of the input
layer is connected with each neuron of the first hidden layer. If there are multiple hidden layers then each
hidden layer of each neuron is also connected with other hidden layers neurons and the last hidden layer
neurons are connected with output layers. So when all nodes X1, X2, X3, and X4 are connected with the first
hidden layer, then each node gets a different weight.
So for the first neuron, the weights are W1(1), W1(2), W1(3). Here 1, 2, 3 in bracket indicates the number of
neurons in the first hidden layer. W1 means the weight of the input layer first node and (1) means the first
neuron of the first hidden layer. So W1(1) means first node weight for the first neuron of the first hidden
layer.
Similarly
For first hidden layer:
Weights for first node is: W1(1),W1(2),W1(3)
Weights for second node is: W2(1),W2(2),W2(3)
Weights for third node is: W3(1),W3(2),W3(3)
Weights for fourth node is: W4(1),W4(2),W4(3)
We know that all neurons and nodes are connected. So if input layers nodes are connected with each neuron it
means that the first hidden layer neurons will also be connected with the second hidden layer neurons and when
they are connected they also have weights.
So the weights are
For the second hidden layer:
W11(1),W11(2)
W22(1),W22(2)
W33(1),W33(2)
Here W11 means first hidden layer first neuron and (1) means second layer first neuron.
Now the second layer is connected with the output layer and
the
Output layer weights are:
W111(1), W222(1)
Here W111 means second hidden layer first neuron and W222 means second hidden layer second neuron and (1)
means output layer first neuron. Here we have only one neuron but there can be multiple neurons.
We know that some calculations happen
Calculation happen in forward propagation:
we divide it in two steps:
Step 1:
Summation of all inputs nodes values*weights, and bias.
Formula: y=W1*X1 + W2*X2 + W3*X3 +....+ Wn*Xn + bias
In this case:
For First hidden layer:
for first neuron: y1=X1*{W1(1)}+X2*{W2(1)}+X3{W3(1)}+X4*{W4(1)}+bias
for second neuron: y2=X1*{W1(2)}+X2*{W2(2)}+X3{W3(2)}+X4*{W4(2)}+bias
for third neuron: y3=X1*{W1(3)}+X2*{W2(3)}+X3{W3(3)}+X4*{W4(3)}+bias
Step 2:
Run activation function. In this case we are using relu.
So it will
looks like :relu(y1),relu(y2),relu(y3).
After doing the calculation our work is done.
Now, these steps will happen in each hidden layer of neurons. This means now these outputs and new weights
will go to the second hidden layer neurons then again step 1 and step 2 will happen and then the output of the
second hidden layer neurons and new weights will go to our output layer node. Same steps will happen in the
output layer also. In the output layer, we can use different activation functions like sigmoid, etc. After the
calculation, we will get the output from our output layer.
Sometimes we get wrong result or prediction and this is a problem. To get the correct output, we use loss or
cost function. Loss and cost function is nothing but finding the difference between our actual value and
predicted value. After getting loss function value by applying the loss function, we try to reduce it and try
to make it near to zero. To reduce we use an optimizer.
Now let's say the output of first hidden layer:
The first hidden layer is :O11(1),O11(2)(first neuron),
O22(1)(second neuron),O22(2)(second neuron)
O33(1), O33(2)(third neuron),
Output of second hidden layer:
O111(1)(first neuron),O222(1)(second neuron)
The output of output layer neuron:
is O31.
In back propagation, we try to reduce the loss by updating the weights with the help of an optimizer. The
optimizer tries to update all the weights in such a way that the predicted value should be equal to the actual
value or near to the actual value. So this way we get a proper output and this way actually a ANN model
works.
Now lets see that how this weight updating happen:
Here we will use gradient descent as optimizer.
Formula:
Wt=Wt-1-η(dL/dWt-1)
Here,
Wt=New weight
Wt-1=Previous weight
dL=Loss function value
dWt-1= Previous weight
η= Learning rate
dL/dWt-1=Slope
We are updating the weights to reduce loss. That weight which we want to update, we have to take all those
output values which are impacted by that weight. Suppose here second hidden layer weights are W111(1) and
W222(1). These weights are impacting O1111 output. So to update this weight in the formula where we find the
slope(dL/dWt-1), there we have to take all those output values where that particular weights are
impacting. Here W111(1) and W222(1) are impacting only O1111 output. So we have to take the value of O1111
output in the slope(dL/dWt-1) to update W111(1) or W222(2) weights.
So according to the chain rule of weights updating for w111(1) is:
W111(1)new=W111(1)old-η{dL/dW111(1)old}
=>W111(1)new=W111(1)old-η[(dl/dO1111)*(dO1111/dW111(1)old)]
for w222(1) is:
W222(1)new=W222(1)old-η{dL/dW222(1)old}
=>W222(1)new=W222(1)old-η[(dl/dO1111)*(dO1111/dW222(1)old]
Now if we want to update the second hidden layer first neuron then what will happen?
See here the second hidden layer first neuron weights are W11(1), W22(1), W33(1). Here these two weights are
impacting O111(1) and O1111 outputs. So to update these weights we have to take these two output values.
So according to the chain rule of weights updating for w11(1) is:
W11(1)new=W11(1)old-η{dL/dW11(1)old}]
=>W11(1)new=W11(1)old-η[(dl/dO31)*(dO1111/dO111(1))(dO111(1)/dW11(1)old)]
for w22(1) is:
W22(1)new=W22(1)old-η{dL/dW22(1)old}
=>W22(1)new=W22(1)old-η[(dl/dO31)*(dO1111/dO111(1))(dO111(1)/dW22(1)old]
for w33(1) is:
W33(1)new=W33(1)old-η{dL/dW33(1)old}
=>W33(1)new=W33(1)old-η[(dl/dO31)*(dO1111/dO111(1))(dO111(1)/dW33(1)old]
So the chain rule works like this. To update a weight we have to take all those outputs values where that
particular weight are impacting and we have to take those outputs value in the backward direction. Look here
O1111 output is after O111(1) output and O111(1) output is after O11(1) output. It means O11(1) then O111(1)
then O1111. But in the formula first, we took O1111 which is the last then we took O111(1) output, and then we
took O11(1) output. It means O1111-->O111(1)-->O11(1). So we can say that we are taking outputs in the
backward direction. That's why we called this process as back propagation