Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Introduction

Setup

ANN

Working process ANN

Propagation

Bias parameter

Activation function

Loss function

Overfitting and Underfitting

Optimization function

Chain rule

Minima

Gradient problem

Weight initialization

Dropout

ANN Regression Exercise

ANN Classification Exercise

Hyper parameter tuning

CNN

CNN basics

Convolution

Padding

Pooling

Data argumentation

Flattening

Create Custom Dataset

Binary Classification Exercise

Multiclass Classification Exercise

Transfer learning

Transfer model Basic template

RNN

How RNN works

LSTM

Bidirectional RNN

Sequence to sequence

Attention model

Transformer model

Bag of words

Tokenization & Stop words

Stemming & Lemmatization

TF-IDF

N-Gram

Word embedding

Normalization

Pos tagging

Parser

semantic analysis

Regular expression

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Deep learning activision functions

Some most use activision functions

Linear activation function

How activation linear works?

We use a linear activation function when our data is linear in nature. We can also say that we use this function for the regression problem. Here the output will not be limited between any range. So, the range will be negative infinity to positive infinity. We use linear activation function only in the output layer.

Formula:
f(x) = x

Range: -infinity to infinity

Sigmoid activation function




How Sigmoid activation works?

It is the most widely used non-linear activation function. We use this function for binary classification means 0 or 1. We use it in the output layer. The sigmoid function graphical representation looks like a curve or S shape. It doesn't matter what is my y value or how big my y value is, if we put this value into the sigmoid function then it will convert those values between 0 to 1. Here 0.5 is the threshold value. If y(means weighted sum) is less than 0.5 then it will decide it 0 means the neuron will not get activated and if greater than equal 0.5 then it will decide it 1. It means the neuron will get activated.

Formula : 1/(1+e-y)
In a binary classification problem, we use the sigmoid activation function. We use it in the output node because there the output is 0 or 1 and sigmoid transform value between 0 to 1.

Tanh activation function




How Tanh activation works?

Tanh is a hyperbolic trigonometric function. We use this function for binary classification. We usually used it in hidden layers. It is very similar to the sigmoid activation function but almost always better than the sigmoid function. Tanh is a mathematically shifted version of the sigmoid activation function. The range of values, in this case, is from -1 to 1. In sigmoid function we saw that the range is 0 to 1 and the middle point or threshold is 0.5 and if value is les than 0.5 then sigmoid will decide it 0 and if greater than equal to 0.5 then sigmoid will decide it as 1. But in tanh function the range is 1 to -1 and mid point or threshold is 0. The shape is also like curve or S-shape.
Formula: tanh(z)=(1-e-2z)/(1+e-2z)

Relu activation function




How ReLU activation works?

ReLU function is non-linear activation function. We use this function in the hidden layer. The main advantage of this function is that it does not activate all the neurons at the same time. We use this function for binary classification means 0 or 1. If the y(weighted sum ) value is less than 0 or 0 then that value will transform into 0 means output will be always 0. It means not activated and if grater that 0 like 1, 2, 3, 4, 10, 500, etc then it will transform it a particular positive value.

So the range is: 0 to infinity

Suppose my y is 2 then output is 2 means activated, if 500 then output will be 500 means activated and if our y is 0, -4, -1, -2, -10, -500 then the output will be always 0. It's mean that the neurons will only be deactivated if the output of linear transformation is less than equal 0 and will activated if the output of transformation is greater than 0. Relu activation function learns much faster than sigmoid and Tanh activation function

Leaky Relu activation function




How Leaky Relu activation works?

We use this function for binary classification means 0 or 1. Leaky relu function is nothing but an improved version of the relu function. As we saw that for the relu function, the gradient is 0 if the value of y is less than equal 0(y < 0), which made the neurons die for activations in that region. Leaky relu is created to overcome this problem. Here we don't defining the relu function as 0, we define it as a small linear component of x. In relu function negative value means -1, -1000, -100000044 will be always 0 but in leaky relu negative value means -1, -1000, -100000044 will not consider as 0 or greater than 0. It will consider less than 0 and the difference between 0 and less than zero will be very little but the difference will increase according to the negative value increase.

Formula:
f(x) = ax, x
f(x) = x, \text{otherwise}

Swish activation function



How Swish activation works?

Swish shows better performance than ReLU on deeper models. It is developed by Google. it is as computationally efficient as ReLU. Swich function is not monotonic. Here the curve of the function is smooth and the function is differentiable at all points

Range: -infinity to infinity

Function:
f(x) = x*sigmoid(x)
f(x) = x/(1-e^-x)

In Swish, the value of the function may decrease even when the input values are increasing.

Softmax activation function



How Softmax activation works?

We use this function for multiple classifications. We use this function in the output layer. We can say that, softmax is a combination of multiple sigmoid functions. Sigmoid returns values between 0 and 1, which we can treat as the probabilities of a data point belonging to a particular class. Softmax squeezes the outputs between 0 and 1 of each class and also divides by the sum of outputs. In the multi classification problem, the output layer can have any number of neuron but more than one. So we can say the number of neurons will be equal to the number of classes in the target. If you have three classes, it means we will have three neurons in the output layer.

Suppose we have four neurons on the output layer and we got the output from the neurons are 1.4, 0.33, 0.8, 077. After applying softmax function we will get output 0.28, 0.45, 0.17, 0.10.
If you do sum of the softmax function result, you will get the value as 1. It happens because the outputs of softmax are interrelated. Now if we wants to increase one class value then other will be automatically decrease by an equal amount.

Softmax(Yi)=exp(Yi)/Σexp(Yj)

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us

© Copyright All rights reserved www.CodersAim.com. Developed by CodersAim.