Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Introduction

Setup

ANN

Working process ANN

Propagation

Bias parameter

Activation function

Loss function

Overfitting and Underfitting

Optimization function

Chain rule

Minima

Gradient problem

Weight initialization

Dropout

ANN Regression Exercise

ANN Classification Exercise

Hyper parameter tuning

CNN

CNN basics

Convolution

Padding

Pooling

Data argumentation

Flattening

Create Custom Dataset

Binary Classification Exercise

Multiclass Classification Exercise

Transfer learning

Transfer model Basic template

RNN

How RNN works

LSTM

Bidirectional RNN

Sequence to sequence

Attention model

Transformer model

Bag of words

Tokenization & Stop words

Stemming & Lemmatization

TF-IDF

N-Gram

Word embedding

Normalization

Pos tagging

Parser

semantic analysis

Regular expression

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Deep learning chain rule of artificial neural network

What is chain rule?

Chain rule means what are the steps that happen in ANN and how those steps are happen in ANN.

How chain rule works?

Steps or work happen in forward propagation:

Suppose we have four features X1, X2, X3, X4, two hidden layers, and one output layer. In the first hidden layer, we have three neurons, and second hidden layer we have two neurons. We know that each node of the input layer is connected with each neuron of the first hidden layer. If there are multiple hidden layers then each hidden layer of each neuron is also connected with other hidden layers neurons and the last hidden layer neurons are connected with output layers. So when all nodes X1, X2, X3, and X4 are connected with the first hidden layer, then each node gets a different weight.
So for the first neuron, the weights are W1(1), W1(2), W1(3). Here 1, 2, 3 in bracket indicates the number of neurons in the first hidden layer. W1 means the weight of the input layer first node and (1) means the first neuron of the first hidden layer. So W1(1) means first node weight for the first neuron of the first hidden layer.
Similarly
For first hidden layer:
Weights for first node is: W1(1),W1(2),W1(3)
Weights for second node is: W2(1),W2(2),W2(3)
Weights for third node is: W3(1),W3(2),W3(3)
Weights for fourth node is: W4(1),W4(2),W4(3)

We know that all neurons and nodes are connected. So if input layers nodes are connected with each neuron it means that the first hidden layer neurons will also be connected with the second hidden layer neurons and when they are connected they also have weights.
So the weights are
For the second hidden layer:
W11(1),W11(2)
W22(1),W22(2)
W33(1),W33(2)
Here W11 means first hidden layer first neuron and (1) means second layer first neuron.

Now the second layer is connected with the output layer and
the
Output layer weights are:
W111(1), W222(1)

Here W111 means second hidden layer first neuron and W222 means second hidden layer second neuron and (1) means output layer first neuron. Here we have only one neuron but there can be multiple neurons.

We know that some calculations happen
Calculation happen in forward propagation:
we divide it in two steps:

Step 1:
Summation of all inputs nodes values*weights, and bias.
Formula: y=W1*X1 + W2*X2 + W3*X3 +....+ Wn*Xn + bias

In this case:
For First hidden layer:
for first neuron: y1=X1*{W1(1)}+X2*{W2(1)}+X3{W3(1)}+X4*{W4(1)}+bias
for second neuron: y2=X1*{W1(2)}+X2*{W2(2)}+X3{W3(2)}+X4*{W4(2)}+bias
for third neuron: y3=X1*{W1(3)}+X2*{W2(3)}+X3{W3(3)}+X4*{W4(3)}+bias

Step 2:
Run activation function. In this case we are using relu.
So it will
looks like :relu(y1),relu(y2),relu(y3).
After doing the calculation our work is done.

Now, these steps will happen in each hidden layer of neurons. This means now these outputs and new weights will go to the second hidden layer neurons then again step 1 and step 2 will happen and then the output of the second hidden layer neurons and new weights will go to our output layer node. Same steps will happen in the output layer also. In the output layer, we can use different activation functions like sigmoid, etc. After the calculation, we will get the output from our output layer.

Sometimes we get wrong result or prediction and this is a problem. To get the correct output, we use loss or cost function. Loss and cost function is nothing but finding the difference between our actual value and predicted value. After getting loss function value by applying the loss function, we try to reduce it and try to make it near to zero. To reduce we use an optimizer.

Now let's say the output of first hidden layer:
The first hidden layer is :O11(1),O11(2)(first neuron),
O22(1)(second neuron),O22(2)(second neuron)
O33(1), O33(2)(third neuron),

Output of second hidden layer:
O111(1)(first neuron),O222(1)(second neuron)

The output of output layer neuron: is O31.

Steps or work happen in back propagation:

In back propagation, we try to reduce the loss by updating the weights with the help of an optimizer. The optimizer tries to update all the weights in such a way that the predicted value should be equal to the actual value or near to the actual value. So this way we get a proper output and this way actually a ANN model works.

Now lets see that how this weight updating happen:
Here we will use gradient descent as optimizer.

Formula: W_t=W_t-1-η(dL/dW_t-1)
Here,
W_t=New weight
W_t-1=Previous weight
dL=Loss function value
dW_t-1= Previous weight
η= Learning rate
dL/dW_t-1=Slope

We are updating the weights to reduce loss. That weight which we want to update, we have to take all those output values which are impacted by that weight. Suppose here second hidden layer weights are W111(1) and W222(1). These weights are impacting O1111 output. So to update this weight in the formula where we find the slope(dL/dW_t-1), there we have to take all those output values where that particular weights are impacting. Here W111(1) and W222(1) are impacting only O1111 output. So we have to take the value of O1111 output in the slope(dL/dW_t-1) to update W111(1) or W222(2) weights.

So according to the chain rule of weights updating for w111(1) is:
W111_(1)new=W111_(1)old-η{dL/dW111_(1)old}
=>W111_(1)new=W111_(1)old-η[(dl/dO1111)*(dO1111/dW111_(1)old)]

for w222(1) is:
W222_(1)new=W222_(1)old-η{dL/dW222_(1)old}
=>W222_(1)new=W222_(1)old-η[(dl/dO1111)*(dO1111/dW222_(1)old]

Now if we want to update the second hidden layer first neuron then what will happen?
See here the second hidden layer first neuron weights are W11(1), W22(1), W33(1). Here these two weights are impacting O111(1) and O1111 outputs. So to update these weights we have to take these two output values.

So according to the chain rule of weights updating for w11(1) is:
W11_(1)new=W11_(1)old-η{dL/dW11_(1)old}]
=>W11_(1)new=W11_(1)old-η[(dl/dO31)*(dO1111/dO111(1))(dO111(1)/dW11_(1)old)]

for w22(1) is:
W22_(1)new=W22_(1)old-η{dL/dW22_(1)old}
=>W22_(1)new=W22_(1)old-η[(dl/dO31)*(dO1111/dO111(1))(dO111(1)/dW22_(1)old]

for w33(1) is:
W33_(1)new=W33_(1)old-η{dL/dW33_(1)old}
=>W33_(1)new=W33_(1)old-η[(dl/dO31)*(dO1111/dO111(1))(dO111(1)/dW33_(1)old]

So the chain rule works like this. To update a weight we have to take all those outputs values where that particular weight are impacting and we have to take those outputs value in the backward direction. Look here O1111 output is after O111(1) output and O111(1) output is after O11(1) output. It means O11(1) then O111(1) then O1111. But in the formula first, we took O1111 which is the last then we took O111(1) output, and then we took O11(1) output. It means O1111-->O111(1)-->O11(1). So we can say that we are taking outputs in the backward direction. That's why we called this process as back propagation

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us