Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
introduction
Setup
Read data
Data preprocessing
Data cleaning
Handle date-time column
Handling outliers
Encoding
Feature_Engineering
Feature selection filter methods
Feature selection wrapper methods
Multicollinearity
Data split
Feature scaling
Supervised Learning
Regression
Classification
Bias and Variance
Overfitting and Underfitting
Regularization
Ensemble learning
Unsupervised Learning
Clustering
Association Rule
Common
Model evaluation
Cross Validation
Parameter tuning
Code Exercise
Car Price Prediction
Flight Fare Prediction
Diabetes Prediction
Spam Mail Prediction
Fake News Prediction
Boston House Price Prediction
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
Feature selection is done because sometimes all the features are not that much important for the ml model and
those features which are not that important can be causes of bad accuracy. So to get a good accuracy select
those features which are playing an important role.
Suppose we have three independent features and one dependent feature or target variable. Now using feature
selection techniques we will try to find correlation between dependent and independent feature or target
variable. If a independent feature is correlated with target variable then we keep that feature and if not
then we remove that feature.
Correlated means if an independent feature increases then the target variable or dependent feature will also
increase or decreases or if the feature decreases then the independent or target feature will also increase or
decrease. If this happens then we can say that the dependent feature is correlated with the target variable
because if ones get down then the other also get down or up or one gets up the other also get down or up.
To find the important features there are so many techniques that are going to be discussed.
For categorical columns, encoding needs to be done before applying the feature selection technique.
. Encoding will be discussed in the next lecture.
By using this technique those features will be removed which contain constant values or we can say low
variance features. Those features which contain constant values are not important for the machine learning
algorithm.
Low Variance or constant Feature : Not a good model
Hight Variance or no constant Feature: Good model
If there are constant features or low variance features then by using variance feature selection technique we
can remove that.
Use this technique on numerical column which contain constant value.
For categorical column at first convert them into numeric and then perform variance feature selection.
Let's see an example:
Let's implement it on a real dataset:
See this real data implementation if you are able to create a project. If not then watch all the previous
lectures and then come here.
Normally independent features should be correlated with dependent features and uncorrelated with each other.
Because if an independent feature is highly correlated with another independent feature then we can predict
one independent feature from another one. So if two or more independent features are highly correlated with
each other then the model needs only one feature among those highly correlated features. This is done because
if these three features are highly correlated with each other then these features will behave like duplicate
features.
Suppose three features are highly correlated with each other. So in the model, we have to put only one feature
among these three features.
Now how will we choose among these three features which one we should take and which should remove?
So the answer is,
Here these three features are highly correlated to each other. So among these three features which one is the
more correlated with the target feature or variable we will take that feature and the other two will be
removed.
To find the correlation between independent features we use threshold values like 0.7, 0.8, etc. Here 0.7
means 70% and 0.8 means 80%. By using threshold value what we do is that if multiple independent features are
correlated more than the threshold value to each other then we take only one among them. Suppose three
features are more than 80% correlated to each other. In this case, we will take only among them. So only one
will be taken(which is more correlated to the target feature among them) and others will be dropped.
Let's see an example:
See this real data implementation if you are able to create a project.If not then watch all the lecture and
then come here.
In feature selection, we try to find the relationship between independent and dependent features. Here if the
mutual information of independent and dependent is 0 then the independent and dependent feature relation is
independent means not correlated and if the value is higher than 0 then those features are correlated. Now how
much correlated, depends on the value. Mutual information is calculated based on entropy estimation.
Formula:!(X;Y)=H(X)-H(X|Y)
Here,
!(X;Y)=Mutual information for X and Y.
H(X)=Entropy for X
H(X|Y)=Conditional entropy for X given Y
We get the result units of bits.
Here we get information about features, which are more important. It depends on us that how many features we
will use for training. But take those feature which are more important to get a better accuracy.
Look apply this technique on numerical independent features. If your independent column contain categorical
values then don't use this technique. If the dependent column contains categorical data then it is not
necessary to convert the data into numerical form but for better result you can do that.
Let's see an example:
See this real data implementation if you are able to create a project. If not then watch all the lectures
and then come here.
This technique is used on categorical variables in the classification task. It means, to see the relationship
between the categorical features with the target feature this technique is used. The chi-square test gives two
values one is F-score and the other is the p-value. The F-score should be higher. That feature is more
important which has a high F-score. The p-value should be less. That feature is more important which has less
p-value.
Because we perform chi square test on categorical data so at first convert them into numerical form using
encoding techniques and then perform this feature selection method.
Let's see an example:
Watch this real data implementation if you are able to create a project. If not then watch all the lectures
and then come here.
This technique gives a score for each feature of our data. In this technique the higher the score more
relevant it is. It means that here if the score is high,then that feature is more important and if the score
is low then the feature is less important.
Let's see an example:
Watch this real data implementation if you are able to create a project. If not then watch all the lectures
and then come here.