deep learning ann gradient problem

Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Introduction

Setup

ANN

Working process ANN

Propagation

Bias parameter

Activation function

Loss function

Overfitting and Underfitting

Optimization function

Chain rule

Minima

Gradient problem

Weight initialization

Dropout

ANN Regression Exercise

ANN Classification Exercise

Hyper parameter tuning

CNN

CNN basics

Convolution

Padding

Pooling

Data argumentation

Flattening

Create Custom Dataset

Binary Classification Exercise

Multiclass Classification Exercise

Transfer learning

Transfer model Basic template

RNN

How RNN works

LSTM

Bidirectional RNN

Sequence to sequence

Attention model

Transformer model

Bag of words

Tokenization & Stop words

Stemming & Lemmatization

TF-IDF

N-Gram

Word embedding

Normalization

Pos tagging

Parser

semantic analysis

Regular expression

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Before starting we have to know two things:
If we do multiplication of numbers which are between 0 to 1 then our answer will be less than those numbers.
For example:
Suppose we have three numbers 0.1, 0.4, 0.7. Now if we do multiplication of these three numbers then the result will come less than these three numbers.
like: 0.1*0.4*0.7=0.028

Here we can see that our result is less than those numbers that we used for multiplication. Now if we do multiplication of numbers that are greater than 1 then the result will be greater than those numbers.

For example:
We have three numbers 2, 5, 8. Now if we do multiplication of these three numbers then the result will be greater than these three numbers. like:2*5*8=80
Here we can see that our result is greater than those numbers that we used for multiplication.

Types of gradient problem:

1.Exploding gradient problem.
2.Vanishing gradient problem.

What is Exploding Gradient and Vanishing gradient problem?

We know that in a neural network we have an input layer, hidden layer, and output layer. In the hidden layer and output layer, we have neurons. Here each neuron is connected with other neurons. If we have multiple hidden layers then the first hidden layer neurons output goes to second hidden layer neurons then the second hidden layer output goes to third hidden layer neurons and this chain work like this till we reach the output layer.

In the first hidden layer neurons, we multiply weights with inputs and then do sum all of those results and then add a bias. After this, we run an activation function. Now, this output will go to second-layer neurons. There we will again add new weights and after calculation, we run an activation function. These same things will happen to every neuron of each hidden layer. Then at last the output layer.

Now here with each neuron of the hidden layer, we are multiplying new weights every time. Now if we take a value between 0 to 1 for weights then the output will become less than the previous weight value each time after multiplication. If we take weight value greater than 1 then our output will become greater than the previous weight value each time after multiplication.

Formula of optimization:
W_t=W_t-1-η{dL/dW_t-1}
Here,
W_t=New weight
W_t-1=Previous weight
dL=Loss function value
dW_t-1= Previous weight
η= Learning rate

When we want to optimize our model, if we take weights value greater than 1 then the value of dL/dW(t-1) will become very high. For this reason, the new weight will never come to the minimum value because the step size is big and we are not given any chance to stop at the minimum point. When our value of dL/dW(t-1) is very high then it is called exploding gradient problem.

if we take weights values between 0 to 1 then the value of dL/dW(t-1) will become very low. When the value of dL/dW(t-1) is too low then there will be no difference or we can say less difference(which is not countable) between new weight and previous weight and this problem is called the vanishing gradient problem.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us