Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

introduction

Setup

Read data

Data preprocessing

Data cleaning

Handle date-time column

Handling outliers

Encoding

Feature_Engineering

Feature selection filter methods

Feature selection wrapper methods

Multicollinearity

Data split

Feature scaling

Supervised Learning

Regression

Classification

Bias and Variance

Overfitting and Underfitting

Regularization

Ensemble learning

Unsupervised Learning

Clustering

Association Rule

Common

Model evaluation

Cross Validation

Parameter tuning

Code Exercise

Car Price Prediction

Flight Fare Prediction

Diabetes Prediction

Spam Mail Prediction

Fake News Prediction

Boston House Price Prediction

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Everything about feature engineering in machine learning

What is feature?

Suppose there is a car and its color is blue, it has four-wheel, one engine. Color, wheel, and engine are features of a car. Now to predict the car price these features will be used. Because according to the car color, and engine the price can be predicted of a car because the price depends on this features. So here price is a dependent feature because the price is dependent on color and engine feature. Color, engine are independent features because these features are not dependent on any other features.

What is Feature Engineering?

Sometimes in data, there can be a lot of columns or features, but sometimes all the features are not required or not that much correlated with the target. You have to take those data which are correlated means have an impact on the project. So for taking correlated data unnecessary features need to remove and this is called reduction. Sometimes create new features are also required to get better accuracy. New features can be created by splitting a feature or combining multiple features or changing the type of features. So all these things come under feature engineering.

Why do we need to create new features?

Sometimes the existing features don't give us good accuracy or result. To get good performance or output sometimes we create or develop new features.

What is variable?

Feature and variable both are the same. In machine learning we commonly say feature and in statistics, we commonly say variable but the meaning and use of both are the same.

Let's see some example:
In the table Aircraft, boarding time and Aircraft reach the station both are features or variable.

Aircraft boarding time	Aircraft reach at station
01:00	01:00
03:00	3:15
04:00	04:10

Types of variable or feature

Numerical variable:
  There are two types of numerical variable:
   Discrete: Example: integer value
   Continuous:Example: float value
Categorical:
  There are two types of categorical feature:
   Ordinal: It means order like sunday, monday, tuesday
   or grade like A,B,C etc.
   Nominal: It means no order.
Date and Time
Mixed

What is dependent and independent feature or variable?

Independent variable/feature means that variable/feature in a dataset which doesn't depend on other variables or features in the dataset . Dependent variable/feature means that variable/feature which depend on other features in a dataset.Suppose we have four features of dress: 1.price 2.Name 3.Color 4.Country. Here price is a dependent variable because the price of a dress is depend on other features like name,color,country. Without these features, we can't say or predict the price.Name,color and country are independent variables because these doesn't depend on any other variables or features.

Why do we need to understand the types of variable?

It's important because without understanding the type of variable we can't perform any work.

For example:
1. To fill the missing values different techniques are used for the categorical and numerical variables. So without knowing the types of variable, filling the missing values are not possible.
2.The machine learning model doesn't understand categorical variables. To work with the categorical variable in the machine learning model the categorical variable need to convert into numerical variable. So without knowing the types of variable it is not possible
3. Here we need to create new features. So without knowing the types of the variable it is quite difficult to create new features.

Steps of creating new feature:

1. Testing features.At first test with existing features. If it's doesn't work then go for new features.
2. Decide that which features should be created.
3. Creating features
4. Check that how the features are working with the model
5. Improve the features if needed

Example:
In the table, there are two features one is boarding time and another one is reached at station time. After watching these two features two new features can be created like Delayed or on time and Delay time in a minute.

Aircraft boarding time	Aircraft reached at station
01:00	01:00
03:00	3:15
04:00	04:10

Creating new features:

Aircraft boarding time	Aircraft reach at station	Delayed time or On time	Delayed time in minute
01:00	01:00	On time	00:00
03:00	3:15	Delayed	00:15
04:00	04:10	Delayed	00:10

Various feature selection techniques:

How feature selection techniques works?

Feature selection is done because sometimes all the features are not that much important for the ml model and those features which are not that important can be causes of bad accuracy. So to get a good accuracy select those features which are playing an important role.

Suppose we have three independent features and one dependent feature or target variable. Now using feature selection techniques we will try to find correlation between dependent and independent feature or target variable. If a independent feature is correlated with target variable then we keep that feature and if not then we remove that feature.

Correlated means if an independent feature increases then the target variable or dependent feature will also increase or decreases or if the feature decreases then the independent or target feature will also increase or decrease. If this happens then we can say that the dependent feature is correlated with the target variable because if ones get down then the other also get down or up or one gets up the other also get down or up.

To find the important features there are so many techniques that are going to be discussed. For categorical columns, encoding needs to be done before applying the feature selection technique. . Encoding will be discussed in the next lecture.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us