Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Introduction

Setup

ANN

Working process ANN

Propagation

Bias parameter

Activation function

Loss function

Overfitting and Underfitting

Optimization function

Chain rule

Minima

Gradient problem

Weight initialization

Dropout

ANN Regression Exercise

ANN Classification Exercise

Hyper parameter tuning

CNN

CNN basics

Convolution

Padding

Pooling

Data argumentation

Flattening

Create Custom Dataset

Binary Classification Exercise

Multiclass Classification Exercise

Transfer learning

Transfer model Basic template

RNN

How RNN works

LSTM

Bidirectional RNN

Sequence to sequence

Attention model

Transformer model

Bag of words

Tokenization & Stop words

Stemming & Lemmatization

TF-IDF

N-Gram

Word embedding

Normalization

Pos tagging

Parser

semantic analysis

Regular expression

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Learn everything about sequence to sequence in RNN

What is sequence to sequence?

Suppose our RNN model takes a sequence of data as input and the output is also a sequence. In this case, we use the sequence to sequence technique.

For example:
If we transform the English language to Bangla, then we need this technique. To convert an image into text or text into an image we use this technique.

In sequence to sequence there are two main components one is an encoder and the other is the decoder.

How sequence to sequence works?




The black color layer is encoder and the gray layer is decoder.

How encoder works?
The encoder is an input network that will be responsible for taking an input and understanding the relationship between the inputs and will give an output in the form of a context vector. In the Encoder of sequence to sequence technique, we get output only from the last time step or last neuron. We can also see it in the diagram. Each time step or neuron of encoder takes current input and previous neuron output value but only the last neuron gives the final output. Here our output is W(in the encoder). Here output W is in the form of a context vector.
Because in RNN or sequence to sequence, the text is passed in the form of a vector. Once I get the context vector from output(W), then it will pass to the decoder's first cell.

In the decoder, we get output from each neuron. Suppose we are passing an English sentence to the encoder. So in the first neuron, we are passing the first word A and previous neuron value 0, then in the second neuron we are passing the second word B and previous neuron value. In the third neuron we are passing C and previous neuron value. So this way it will go on. From the last neuron of the encoder, we will get the output W in form of a vector.

How decoder works?
Now, this output will go to the decoder first neuron and we will get an output X. Then this output(X) will go to the second time step or neuron and we will get an output(Y). Now, this output(Y) will again pass to the next neuron and will get new output and this process will go on until we reach the last neuron and get the last neuron output. So we can say that in the decoder previous time step or neuron output become the input for the current time step or cell.

Here input size and output size sequence and number can be different. If we pass five words in English and the output we can get 7 words in the Bangla language. So here the number of outputs and input can be different.

After getting the output we have to do back propagation to reduce the loss. To reduce the loss we update the weights. Here, by the decoder, we expect an input y but we get an output Y. So now we have to find the loss and will try to reduce the loss in back propagation. We can use LSTM, GRU or simple RNN in time steps or neurons of the encoder and decoder. In the diagram, EOS means End of steps.

Problems with encoder and decoder

The major problem is that this technique gives a very good accuracy for small sentence but give very bad accuracy if we pass a big sentence like 100 or more than 100-words at a time. Suppose I give you a sentence of 120-words at a time and tell you to convert it into the Bangla language.
Then what will happen?
You will not be able to convert it. Because you will not be able to take and work with that much information at a time. The same thing happens in the encoder and decoder. If we give a very long sentence at a time then the output(W) of the encoder can't capture all the information.

For example, the nearest value for output W is C and EOS. W output can capture these two values perfectly. But if we compare A and C values in the diagram, then for output W, it will be more difficult to capture A value information than C. Now if we have more than 100 inputs then think how difficult it will become.

The other problem is that we can see encoder and decoder works in one direction and can't work with future value, but a word in a sentence may depend on previous or future words. These types of things we can't perform here. To solve this problem we use the attention model.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us

© Copyright All rights reserved www.CodersAim.com. Developed by CodersAim.