Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
introduction
Setup
Read data
Data preprocessing
Data cleaning
Handle date-time column
Handling outliers
Encoding
Feature_Engineering
Feature selection filter methods
Feature selection wrapper methods
Multicollinearity
Data split
Feature scaling
Supervised Learning
Regression
Classification
Bias and Variance
Overfitting and Underfitting
Regularization
Ensemble learning
Unsupervised Learning
Clustering
Association Rule
Common
Model evaluation
Cross Validation
Parameter tuning
Code Exercise
Car Price Prediction
Flight Fare Prediction
Diabetes Prediction
Spam Mail Prediction
Fake News Prediction
Boston House Price Prediction
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
If two or multiple independent variables are highly correlated with each other in a regression problem, then
it is called multicollinearity. It means, because the independent variables are highly correlated with each
other then we can predict one independent variable by another independent variable. This correlation can be a
positive correlation or a negative correlation. This problem occurs only in a regression problem.
Example:
Let's take linear regression model for example:
Suppose we have two independent variables X1 and X2 and dependent variable Y.
So, the equation is:
Y=a0+A*X1+B*X2
Now if we shift our X1 variable by one unit the Y will be also shift by one unit but other things
and X2 will remain same or constant. Now if shift X2 by one unit then Y again will shift
by one unit but X1 will remain same or constant. Now if multicollinearity is present between these
two X1 and X2 variables and then if we shift X1 or X2 by one then
X2 or X1 will be also shift by one. For this reason we will not able to see any
individual effects of these independent variables on dependent variable Y.
Multicollinearity may not affect the accuracy of the model that much but for multicollinearity lose the
effects of individual independent features on the dependent feature of our model while training.
1. If encoding technique create Dummy variables then it can create multicollinearity problem.
2. Multicollinearity could also occur when new variables are created which are dependent on other
variables.
3. Insufficient data can also create multicollinearity problem.
4. While creating new features if the new feature is dependent on any other feature then it can also create
multicollinearity problem.
1. Increase the sample size.
2. If two variables creating multicollinearity problem then drop one of them. If multiple variables are
creating multicollinearity problem take one variable among those features and drop other features.
3. Re-code the form of the independent variables.