Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Introduction
Setup
ANN
Working process ANN
Propagation
Bias parameter
Activation function
Loss function
Overfitting and Underfitting
Optimization function
Chain rule
Minima
Gradient problem
Weight initialization
Dropout
ANN Regression Exercise
ANN Classification Exercise
Hyper parameter tuning
CNN
CNN basics
Convolution
Padding
Pooling
Data argumentation
Flattening
Create Custom Dataset
Binary Classification Exercise
Multiclass Classification Exercise
Transfer learning
Transfer model Basic template
RNN
How RNN works
LSTM
Bidirectional RNN
Sequence to sequence
Attention model
Transformer model
Bag of words
Tokenization & Stop words
Stemming & Lemmatization
TF-IDF
N-Gram
Word embedding
Normalization
Pos tagging
Parser
semantic analysis
Regular expression
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
In deep learning model, we have an actual output and have a predicted output. So the difference between actual value and predicted output is called loss function. If we have too many outputs then we calculate loss function of each output node and then do avrage of all loss function value and it is called cost funciton. So loss function is for single taraining example and cost function is the average of all loss function.
Formula: loss function=(y-Y)2
Formula: cost
function=(y1-Y1)2+(y2-Y2)2+....+(yn-Yn)2
Here,
y=actual value
Y=predictede value
In mse, we subtract predicted value with the actual value and then do square the result of subtraction.
Advantage:
1.The mse loss penalizes the model for making large errors by squaring them.
2. We don't get any local minima.
3. Plot the quadratic equation, we get a gradient descent with only global minima.
Disadvantage:
It is not robust to the outliers.
Formula: loss function=|y-Y|
Formula: cost function=|y1-Y1|+|y2-Y2|+....+|yn-Yn|
Here,
y=actual value
Y=predictede value
In mae, we subtract the predicted value with the actual value and then do mod of the result of subtraction to
get the absolute value.
Advantage:
The mae is more robust to the outliers as compare to models.
Disadvantage:
Computation is very dificult.
Formula: hl={1/2(y-Y)2,if |y-Y|<= 𝛅,
𝛅|y-Y|-1/2*𝛅2, other
wise}
Here, (y-Y)2= quadratic equation(MSE)
|y-Y|=Linear equation(MAE)
𝛅=It is a hyperparameter.
y=actual value
Y=predictede value
Huber loss is a combination of MSE and MAE loss function. Here, if the |y-Y| is less than equal to alpha(𝛅)
value then we use quadratic equation((y-Y)2) and if not then use linear equation.
This equation work depending on the situation.
One scenario will be, if we use a quadratic equation(MSE) then our model will work well, then Huber loss will
use quadratic equation{(y-Y)2)} and if its good to use Linear equation(MAE) then Huber loss will
use linear equation(|y-Y|).
Huber loss is more robust to outliers than MSE. For the classification problem we use a variant of Huber Loss.
MSLE calculates the ratio between the actual value and the predicted value. MSLE only work on or care about
the percentual difference between actual and predicted value.
Formula:L=1nn∑i=1(log(y(i)+1)−log(^y(i)+1))2
Here, we find loss by calculating the average of the squared differences between the actual and predicted
values. This loss function can be a good choice the data is continuous. It means in regression problem.
Binary classification means a classification problem where output is 0/no/false or 1/yes/true. It means yes or no.
Fromula: bce=-y*log(Y)-(1-y)*log(1-Y)
This formula give us two equation:{-log(1-Y) if y=0 and -log(Y) if y=1}
Here,
y=actual value
Y=predictede value
Here we calculate Y by using the sigmoid activation function. Here if our actual value is 0 then it will give
-log(1-Y) value and if the actual value is 1 then it will give -log(Y) value.
Cross-entropy loss increases when the predicted output probability value deviates from the actual value label.
We can use hinge loss as an alternative to cross-entropy. It is mostly used in support vector machine. In
hinge loss, the target values are in the set of -1, 1.
When there is a difference between actual and predicted values sign then it allows assigning more errors.
Hinge Loss punish the wrong predictions and also punish those right predictions which are not confident. Hinge
loss gives better performance than cross-entropy.
Squared hinge loss is nothing but a squared of the hinge loss output.
Squared hinge loss fits well in binary classification problem.
Squared hinge loss tries to find the boundary that mark the maximum margin between the data points of various
classes.
Formula: mce= − 1 n n ∑ i = 1 K ∑ j = 1[yijlog(pij)]
Here,
i =Indexes samples or observations
j= Indexes classes
yij=probability distributions for K classes
pij=probability distributions for K classes
Multi-class The cross-Entropy function calculates a value that takes an average difference between predicted and actual probability value. To get a good accuracy we try to minimize this value.