Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Introduction
Setup
ANN
Working process ANN
Propagation
Bias parameter
Activation function
Loss function
Overfitting and Underfitting
Optimization function
Chain rule
Minima
Gradient problem
Weight initialization
Dropout
ANN Regression Exercise
ANN Classification Exercise
Hyper parameter tuning
CNN
CNN basics
Convolution
Padding
Pooling
Data argumentation
Flattening
Create Custom Dataset
Binary Classification Exercise
Multiclass Classification Exercise
Transfer learning
Transfer model Basic template
RNN
How RNN works
LSTM
Bidirectional RNN
Sequence to sequence
Attention model
Transformer model
Bag of words
Tokenization & Stop words
Stemming & Lemmatization
TF-IDF
N-Gram
Word embedding
Normalization
Pos tagging
Parser
semantic analysis
Regular expression
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
Suppose our RNN model takes a sequence of data as input and the output is also a sequence. In this case, we
use the sequence to sequence technique.
For example:
If we transform the English language to Bangla, then we need this technique. To convert an image into text or
text into an image we use this technique.
In sequence to sequence there are two main components one is an encoder and the other is the decoder.
The black color layer is encoder and the gray layer is decoder.
How encoder works?
The encoder is an input network that will be responsible for taking an input and understanding the
relationship between the inputs and will give an output in the form of a context vector. In the Encoder of
sequence to sequence technique, we get output only from the last time step or last neuron. We can also see it
in the diagram. Each time step or neuron of encoder takes current input and previous neuron output value but
only the last neuron gives the final output. Here our output is W(in the encoder). Here output W is in the
form of a context vector.
Because in RNN or sequence to sequence, the text is passed in the form of a vector. Once I get the context
vector from output(W), then it will pass to the decoder's first cell.
In the decoder, we get output from each neuron. Suppose we are passing an English sentence to the encoder. So
in the first neuron, we are passing the first word A and previous neuron value 0, then in the second neuron we
are passing the second word B and previous neuron value. In the third neuron we are passing C and previous
neuron value. So this way it will go on. From the last neuron of the encoder, we will get the output W in form
of a vector.
How decoder works?
Now, this output will go to the decoder first neuron and we will get an output X. Then this output(X) will go
to the second time step or neuron and we will get an output(Y). Now, this output(Y) will again pass to the
next neuron and will get new output and this process will go on until we reach the last neuron and get the
last neuron output. So we can say that in the decoder previous time step or neuron output become the input for
the current time step or cell.
Here input size and output size sequence and number can be different. If we pass five words in English and the
output we can get 7 words in the Bangla language. So here the number of outputs and input can be different.
After getting the output we have to do back propagation to reduce the loss. To reduce the loss we update the
weights. Here, by the decoder, we expect an input y but we get an output Y. So now we have to find the loss
and will try to reduce the loss in back propagation. We can use LSTM, GRU or simple RNN in time steps or
neurons of the encoder and decoder. In the diagram, EOS means End of steps.
The major problem is that this technique gives a very good accuracy for small sentence but give very bad
accuracy if we pass a big sentence like 100 or more than 100-words at a time. Suppose I give you a sentence of
120-words at a time and tell you to convert it into the Bangla language.
Then what will happen?
You will not be able to convert it. Because you will not be able to take
and work with that much information at a time. The same thing happens in the encoder and decoder. If we give a
very long sentence at a time then the output(W) of the encoder can't capture all the information.
For example, the nearest value for output W is C and EOS. W output can capture these two values perfectly. But
if we compare A and C values in the diagram, then for output W, it will be more difficult to capture A value
information than C. Now if we have more than 100 inputs then think how difficult it will become.
The other problem is that we can see encoder and decoder works in one direction and can't work with future
value, but a word in a sentence may depend on previous or future words. These types of things we can't perform
here. To solve this problem we use the attention model.