Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Introduction

Setup

ANN

Working process ANN

Propagation

Bias parameter

Activation function

Loss function

Overfitting and Underfitting

Optimization function

Chain rule

Minima

Gradient problem

Weight initialization

Dropout

ANN Regression Exercise

ANN Classification Exercise

Hyper parameter tuning

CNN

CNN basics

Convolution

Padding

Pooling

Data argumentation

Flattening

Create Custom Dataset

Binary Classification Exercise

Multiclass Classification Exercise

Transfer learning

Transfer model Basic template

RNN

How RNN works

LSTM

Bidirectional RNN

Sequence to sequence

Attention model

Transformer model

Bag of words

Tokenization & Stop words

Stemming & Lemmatization

TF-IDF

N-Gram

Word embedding

Normalization

Pos tagging

Parser

semantic analysis

Regular expression

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Working of LSTM and GRU

What is the problem of Simple RNN?

We have already learned about the vanishing gradient problem. This problem occurs when in back propagation we update the weights. We saw this when we learned about ANN. In simple RNN same problem occurs while updating the weights. Here we don't have any long-term memory. Simple RNN networks can't use information from the distant past. Simple RNN can not learn patterns with long dependencies. So these are the problem of Simple RNN. To solve this problem we use LSTM.

What is LSTM and how LSTM works?

In the image:
Red color marked area: is called long term memory cell
Sigma means: sigmoid activation function
tanh means: hyperbolic tangent activation function
Cross inside circle: is a gate and the gate does multiplication of values.
Plus inside circle: is also a gate and the gate does addition of values.
Cyan color marked area: is called short term memory cell
Ct-1: is the output of previous long term memory cell
Ht-1: is the previous main output
Ht: is the current main output
Ct: is the output of the current long term memory cell
X1: is the current input

Here three inputs comes in the neuron Ht-1, Ct-1, X1. Inside each neuron, we have long time memory cell. So Ct-1 is the previous neuron's long time memory cell output value which we are taking inside the current neuron's long time memory cell. Ht-1(previous output) and X1(current output) come inside the cell and get concatenated.
At first, the concatenated value will go to the purple marked area.

Process happen in purple marked area:
After the concatenation the value goes to the purple marked area, there we will run the sigmoid activation function.
Calculation:
Ft=σ(Wf[Ht-1,X1]+Bf)
here
Bf=bias
Wf=weight
Ht-1=Previous output
X1=Current output
σ=Sigmoid activation function
So here we multiply weight with the concatenating value of Ht-1 and X1, and then add bias. After that, we run the sigmoid activation function. We know that the sigmoid activation function convert data into 0(less than 0.5 to negative infinity) or 1(greater than equal 0.5 to positive infinity). Then the result will go to the first gate of the long-term memory cell( which has a cross sign in the image). We called this gate forget get because this gate decides which data or pattern long-term memory should forget and which data or pattern should remember if the new value comes. We get a new value by calculating new input(X1) and previous input(Ht-1) and running a sigmoid activation function.

Calculation in the first get of long term memory cell:
Here we multiply the data which is already present in the long-term memory cell and the new data.
Calculation:
B=Ct-1*Ft
Suppose the value of Ft is [1, 0, 1] and value of Ct [1 , 2, 4]
So result of multiplication is =[1,2,4]*[1,0,1]=[1,0,4]
See one thing in the result value, 2 of Ct become 0 but 1 and 4 are as it is. So we can say that among 1,2,4 values the long-term memory will forget value 2 and will be remember value 1 and 4. Here sigmoid function converts our Ft value between 0 to 1 and here 0 means no and 1 means yes. So after multiplying some value of Ct become 0 means need to forget and some values keep same means don't need to forget .

Process happen in blue marked area:
Here at first our concatenate value of X1 and Ht-1 goes to the red marked area and passes through two activations functions sigmoid and tanh.
Calculation:
Im=σ(Wi[Ht-1,X1]+Bi)
here,
Bi=bias
Wi=weight
Ht-1=Previous output
X1=Current output
σ=Sigmoid activation function

So here we multiply weight with concatenate value of Ht-1 and X1 and then add bias. After that, we run the sigmoid activation function. This activation function convert data into 0(less than 0.5 to negative infinity) or 1(greater than equal 0.5 to positive infinity). On another side, we run the tanh activation function on the concatenating value of Ht-1 and X1.
Calculation:
Ct'=tanh(Wc[Ht-1,X1]+Bc)
here,
Bc=bias
Wc=weight
Ht-1=Previous output
X1=Current output
tanh=hyperbolic tangent activation function

So here we multiply weight with concatenate value of Ht-1 and X1 and then add bias. After that, we run the tanh activation function. This activation function convert data into -1(less than 0 to negative infinity) or 1(greater than equal 0 to positive infinity).>

Now these two output(after applying sigmoid and tanh activation function) goes into a cross sign gate. What the gate does is that it multiply those two outputs that came from sigmoid and tanh activation functions.
Calculation:
A=Ct'*Im
This multiplication works like same as what we did in the case of forget gate. After the multiplication, the value goes to the long-term memory cell's second gate. The second gate is a plus sing gate. This gate does the addition of values.

Now there can be a question, addition of what?
This gate does addition of long-term memory cells those values which we get after the calculation of forget gate and that value which we get from the recent calculation.

Now there can be another question that why we do addition or why do we use this gate?
By using forget gate we find that which value or pattern of our long-term memory should forget when a new value comes. So we can say that by using forget gate we are removing unnecessary data or patterns. So if we remove some data then we have to add some data to fulfill our long-term memory if necessary according to new data.

Suppose we have a sentence in long-term memory and that is "I love Jenny and she is a very good girl". Now if, in the new input we have Rafsun. Here Rafsun is a male name. For this male name, we need to change some words related to gender like she and boy. So according to the new input we have to forget she and girl and we did this using forget gate but besides forget we have to add new word he and boy. So this type of addition happens by using the second gate of the long-term memory cell.
Calculation:
Ct=A+B
Suppose the value of B is [1,0,4] and value of A [0,3,0]
So result of multiplication is =[1,0,4]+[0,5,0]=[1,3,4]

See one thing, in the result value 0 of Ct become 3 but 1 and 4 are the same. So we can say that our new long-term memory values are 1, 3, 4 and we can also say that our long-term memory memories some new things because in the position of 0, 3 comes. This is the value of Ct. Now, this Ct value will go to the next neuron as Ct-1.

Process happen in green marked area:
From this area, we get the output. After the concatenate value goes to the green marked area, there we will run the sigmoid activation function.
Calculation:
ol=σ(Wl[Ht-1,X1]+Bl)
Here,
Bl=bias
Wl=weight
Ht-1=Previous output
X1=Current output
σ=Sigmoid activation function

So here we multiply the weight by concatenating the value of Ht-1 and X1 and then add bias. After that, we run the sigmoid activation function. We know that the sigmoid activation function converts data into 0(less than 0.5 to negative infinity) or 1(greater than equal 0.5 to positive infinity). After this calculation, the value goes to a cross sign gate. This gate does the multiplication of values. Here in the multiplication, one value we get from the calculation that we just did and the other value is the value of long-term memory cell value that we calculated some time ago. But before this multiplication, long-term memory cell passes through tanh activation function.
So the calculation to get Ht is:
Ht=ol*tanh(Ct)
After this calculation, the result is Ht. Now this Ht output will become the input for the next neuron(in the position Ht-1).

So this way all process happen in LSTM.

One disadvantage of LSTM is that it can detect patterns with 100 steps but struggles with 100s/1000s of steps.

What is GRU?

GRU stands for Gated Recurrent Units. It is a modified or lightweight version of LSTM. The difference is, in LSTM, it has two different cells one long-term memory cell and the other one is short-term memory cell but in GRU we have only one cell which contains both long and short-term memory. LSTM has three gates input gate, output gate, and forget gate but GRU has two gates one is update gates and other is the reset gate. Here update gate works for how much memory it should retain and the reset gate works for how much past memory it should forget.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us