Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

introduction

Setup

Read data

Data preprocessing

Data cleaning

Handle date-time column

Handling outliers

Encoding

Feature_Engineering

Feature selection filter methods

Feature selection wrapper methods

Multicollinearity

Data split

Feature scaling

Supervised Learning

Regression

Classification

Bias and Variance

Overfitting and Underfitting

Regularization

Ensemble learning

Unsupervised Learning

Clustering

Association Rule

Common

Model evaluation

Cross Validation

Parameter tuning

Code Exercise

Car Price Prediction

Flight Fare Prediction

Diabetes Prediction

Spam Mail Prediction

Fake News Prediction

Boston House Price Prediction

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Regularization in machine learning

Why regularization is used?

Regularization is used to generalize the machine learning model to reduce maximum error to get better accuracy.

Lasso or L1 Regularization

The advantage of lasso regression is to avoid overfitting. Overfitting occurs when the trained model performs well on the training data and performs poorly on the testing datasets. Lasso regression works by applying a penalty parameter to operate the error. Lasso tries to reduce error to zero or near zero. It means after applying lasso some independent variables will come near to zero and some will be zero. It means some independent variable will vanish.
Now here can be a question that if some of the features get vanished then isn't it a problem? The answer is no because only those features will delete which are not that much correlated with the target variable or feature. Lasso helps to reduce the error, means generalizing the model, and also helps to feature selection.
Formula: (y-Y)+α(|w|)
Here,
y=actual value
Y=Predicted value
y-Y=loss
α=penalty
w=the sum of the independent variable(individual) . Here individually means if there are 3 independent variables then first it will do sum of 1st independent variable then second and so on.

what is penalty?
To reduce loss, just add a value like 0.1,1,2 etc. This value is called penalty.

How lasso works?
Suppose you have two data points and have to draw a best-fitted line for those two data points and the line goes over the data points. It means after drawing the best-fitted line the error will be zero for those two data points. Now if you test the model on a test dataset then new data points will come. After having a new dataset now the error will become more than 0. Because the line is best for those two data points which are in the training dataset. Now the target is to reduce the error. Lasso tries to reduce the error to zero or near zero. According to the formula what lasso does is that lasso finds the loss and with the loss, you add the multiplication of the penalty and squared of summation of the independent variable. By doing this lasso reduces the error.
You have an equation :y=m1X1+m2X2+C=20X1+35X2+25.
You will see here the value of m2 is much. Now if you multiply this value with x then the strength of prediction will be more. So this will give a big error. To reduce this error if somehow you can reduce the value of m2 then you will be able to reduce loss. To reduce it, we use that penalty value and the squared sum of the independent variable. After doing the calculation, you will see the error of the model is generalized or reduced.

We use lasso when we have so many or more number features because it automatically performs feature selection.

from sklearn.linear_model import Ridge,Lasso
lsr=Lasso(alpha=1)
lsr.fit(X_train,Y_train)
lsr.score(X_test,Y_train)

Limitation of Lasso :

1.Suppose there are multiple highly collinear variables. In this case, lasso will do a random selection to select one of them. Selecting this way is not good.

2.Suppose the number of predictors is greater than the number of observations (n). In this case Lasso will pick at most n predictors as non-zero. It will pick most n predictors as non-zero even if all predictors are relevant (or may be used in the test set).

What is Ridge or L2 Regularization?

The advantage of ridge regression is to avoid overfitting. Overfitting occurs when the model performs well on the training data and performs poorly on the testing datasets. Ridge regression works by applying a penalty parameter to operate the error. Ridge tries to reduce error near to zero but not zero.
Formula:
(y-Y)+α(|w|)^2
Here,
y=actual value
Y=Predicted value
y-Y=loss
α=penalty
w=the sum of the independent variable(individually). Here individually means if there are 3 independent variables then first it will do sum of 1st independent variable then second and so on.
What is penalty?
To reduce loss ridge adds a value like 0.1,1,2 etc. This value is called the penalty. Suppose you have two data points and have to draw a best-fitted line for those two data points and the line goes over the data points. it means after drawing the best-fitted line error will be zero for those two data points. Now to test the model on the test dataset then new data points will come. After having a new data point, now the error is more than 0. Because the line is best for those two data points which are in the training dataset. Now the target is to reduce the error. Ridge tries to reduce the error near to zero. According to the formula what it will do that, it will find the loss and with loss, you will add the multiplication of the penalty and squared of summation of the independent variable. By doing this you will be able to reduce the error.
Example:
you have an equation :y=m1X1+m2X2+C=20X1+35X2+25
So if look here we can see that the value of m2 is much. Now if you multiply this value with x then the strength of prediction will be more. So this will give a big error. To reduce this error if somehow you can reduce the value of m2 then the loss will be reduced. To reduce it we use that penalty value and the squared sum of the independent variable. After doing the calculation you will be able to generalize or reduce the error of the model.

The difference between ridge and lasso is that in ridge you do square of w and in lasso, you don't. In ridge, you reduce the error near to zero but in lasso, we reduce our value to zero or near to zero.

from sklearn.linear_model import Ridge,Lasso
rr=Ridge(alpha=1)
rr.fit(X_train,Y_train)
rr.score(X_test,Y_train)

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us