Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Introduction

Setup

ANN

Working process ANN

Propagation

Bias parameter

Activation function

Loss function

Overfitting and Underfitting

Optimization function

Chain rule

Minima

Gradient problem

Weight initialization

Dropout

ANN Regression Exercise

ANN Classification Exercise

Hyper parameter tuning

CNN

CNN basics

Convolution

Padding

Pooling

Data argumentation

Flattening

Create Custom Dataset

Binary Classification Exercise

Multiclass Classification Exercise

Transfer learning

Transfer model Basic template

RNN

How RNN works

LSTM

Bidirectional RNN

Sequence to sequence

Attention model

Transformer model

Bag of words

Tokenization & Stop words

Stemming & Lemmatization

TF-IDF

N-Gram

Word embedding

Normalization

Pos tagging

Parser

semantic analysis

Regular expression

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Learn about most used loss functions

What is loss function?

In deep learning model, we have an actual output and have a predicted output. So the difference between actual value and predicted output is called loss function. If we have too many outputs then we calculate loss function of each output node and then do avrage of all loss function value and it is called cost funciton. So loss function is for single taraining example and cost function is the average of all loss function.

Types of loss function

Regression loss

Mean Squared Error(MSE):

Formula: loss function=(y-Y)²
Formula: cost function=(y₁-Y₁)²+(y₂-Y₂)²+....+(y_n-Y_n)²
Here,
y=actual value
Y=predictede value
In mse, we subtract predicted value with the actual value and then do square the result of subtraction.

Advantage:
1.The mse loss penalizes the model for making large errors by squaring them.
2. We don't get any local minima.
3. Plot the quadratic equation, we get a gradient descent with only global minima.

Disadvantage:
It is not robust to the outliers.

Mean Absolute Error (MAE):

Formula: loss function=|y-Y|
Formula: cost function=|y1-Y1|+|y2-Y2|+....+|yn-Yn|
Here,
y=actual value
Y=predictede value
In mae, we subtract the predicted value with the actual value and then do mod of the result of subtraction to get the absolute value.

Advantage:
The mae is more robust to the outliers as compare to models.

Disadvantage:
Computation is very dificult.

Huber loss:

Formula: hl={1/2(y-Y)²,if |y-Y|<= 𝛅,
𝛅|y-Y|-1/2*𝛅², other wise}
Here, (y-Y)²= quadratic equation(MSE)
|y-Y|=Linear equation(MAE)
𝛅=It is a hyperparameter.
y=actual value
Y=predictede value
Huber loss is a combination of MSE and MAE loss function. Here, if the |y-Y| is less than equal to alpha(𝛅) value then we use quadratic equation((y-Y)²) and if not then use linear equation.
This equation work depending on the situation.

One scenario will be, if we use a quadratic equation(MSE) then our model will work well, then Huber loss will use quadratic equation{(y-Y)²)} and if its good to use Linear equation(MAE) then Huber loss will use linear equation(|y-Y|).

Huber loss is more robust to outliers than MSE. For the classification problem we use a variant of Huber Loss.

Mean Squared Logarithmic Error(MSLE):

MSLE calculates the ratio between the actual value and the predicted value. MSLE only work on or care about the percentual difference between actual and predicted value.

Formula:L=1nn∑i=1(log(y(i)+1)−log(^y(i)+1))2

Here, we find loss by calculating the average of the squared differences between the actual and predicted values. This loss function can be a good choice the data is continuous. It means in regression problem.

Binary Classification loss function

Binary classification means a classification problem where output is 0/no/false or 1/yes/true. It means yes or no.

Binary Cross-Entropy:

Fromula: bce=-y*log(Y)-(1-y)*log(1-Y)

This formula give us two equation:{-log(1-Y) if y=0 and -log(Y) if y=1}
Here,
y=actual value
Y=predictede value
Here we calculate Y by using the sigmoid activation function. Here if our actual value is 0 then it will give -log(1-Y) value and if the actual value is 1 then it will give -log(Y) value.
Cross-entropy loss increases when the predicted output probability value deviates from the actual value label.

Hinge loss:

We can use hinge loss as an alternative to cross-entropy. It is mostly used in support vector machine. In hinge loss, the target values are in the set of -1, 1.
When there is a difference between actual and predicted values sign then it allows assigning more errors. Hinge Loss punish the wrong predictions and also punish those right predictions which are not confident. Hinge loss gives better performance than cross-entropy.

Squared Hinge loss:

Squared hinge loss is nothing but a squared of the hinge loss output.
Squared hinge loss fits well in binary classification problem.
Squared hinge loss tries to find the boundary that mark the maximum margin between the data points of various classes.

Formula: mce= − 1 n n ∑ i = 1 K ∑ j = 1[y_ijlog(p_ij)]
Here,
i =Indexes samples or observations
j= Indexes classes
y_ij=probability distributions for K classes
p_ij=probability distributions for K classes

Multi clsssification loss function

Multi-class Cross-Entropy:

Multi-class The cross-Entropy function calculates a value that takes an average difference between predicted and actual probability value. To get a good accuracy we try to minimize this value.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us