deep learning minima

Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Introduction

Setup

ANN

Working process ANN

Propagation

Bias parameter

Activation function

Loss function

Overfitting and Underfitting

Optimization function

Chain rule

Minima

Gradient problem

Weight initialization

Dropout

ANN Regression Exercise

ANN Classification Exercise

Hyper parameter tuning

CNN

CNN basics

Convolution

Padding

Pooling

Data argumentation

Flattening

Create Custom Dataset

Binary Classification Exercise

Multiclass Classification Exercise

Transfer learning

Transfer model Basic template

RNN

How RNN works

LSTM

Bidirectional RNN

Sequence to sequence

Attention model

Transformer model

Bag of words

Tokenization & Stop words

Stemming & Lemmatization

TF-IDF

N-Gram

Word embedding

Normalization

Pos tagging

Parser

semantic analysis

Regular expression

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Suppose we are using mini batch gradient descent and mse loss function in back propagation.

Now let's create a conversion graph:

What is global minima?

In the x-axis we have our weights. Here w₁ is the initial weight(purple marked).
Now if we draw a tangent line on w₁ then we will get the point for w₁ in the x-axis(black marked).
Now we have to increase or decrease the weight according to the derivative value so that we can come to the center location(green marked point).
This center point or green marked point is called global minima.

Global minima are your best weight point to predict the actual output. Look when we do back propagation and in back propagation, we try to update the weights so that we can get the best weight to predict the actual output.

Now in the image w₁ is my normal weight. Now we have to update the weight so that we can get the best weight to predict the actual output.
Now, where is the best weight?
The best weight is on the global minima location. So in back propagation weight update happens to get the best weight and this best weight is located in global minima point.

Here we have to remember one thing and that is, in global minima, dl(derivative of loss)/dw(derivative of w) will be 0 or we can say that the slope is 0.
Now if we put this dl/dw=0 in the weight update formula then w_new and w_old will become equal because here dl/dw or slope is 0. If w_old and w_new is equal this means, there is no need to update the weights and it will be treated as the perfect weight.
weight update formula: W_t=W_t-1-η{dL/dW_t-1}

If we use mini-batch and MSE then we will get this type of curve that we saw in the first image. But if we use different loss functions then we will get a different type graph.

Let's draw another graph for different loss function:

In this graph, we can see that there are a lot of curves.

Now the question is, in this case, what will be my global minima?
To do this, we have to find that which curve is very or most close to the x-axis. That curve which is most near to the x-axis will be the global minima(green marked point). Here green marked point curve is most near to the x-axis so it is the global minima.

What is Local Minima?

Here we have too many curves.
If one curve is global minima then what are the other curve?
The answer is, the other curve is called local minima(blue marked points). These are called local minima because in a specific area those points contain the best weight value.

What is Local maxima?

Here the brown marked points are the local maxima.

So we can say that local minima are those points that contain the best weight value for a particular location(in the graph). But global minima are those points that contain the best weight value for all locations(in the graph).

What is Convex and Non-convex function?

Convex Function:
In the image of convex function we can see two region one is marked in orange color and other is marked in red color. Now if we take two points and connect those points. Then we will see that those two points and all those points comes between these two points are from same region.
Convex function occurs most in machine learning like linear and logistic regression technique
Non-convex Function:
In the image of convex function we can see two region one is marked in green color and other is marked in yellow color. Now if we take two points and connect those points. Then we will see that those two points and all those points comes between these two points are not from same region.
Non-convex function occurs most in deep learning technique.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us