Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
introduction
Setup
Read data
Data preprocessing
Data cleaning
Handle date-time column
Handling outliers
Encoding
Feature_Engineering
Feature selection filter methods
Feature selection wrapper methods
Multicollinearity
Data split
Feature scaling
Supervised Learning
Regression
Classification
Bias and Variance
Overfitting and Underfitting
Regularization
Ensemble learning
Unsupervised Learning
Clustering
Association Rule
Common
Model evaluation
Cross Validation
Parameter tuning
Code Exercise
Car Price Prediction
Flight Fare Prediction
Diabetes Prediction
Spam Mail Prediction
Fake News Prediction
Boston House Price Prediction
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
Underfitting means when ml model can't fit or touch enough/much/maximum data points on the best-fitted line.
For this reason, ml model gives bad or very bad predictions on both training and test dataset and this is
called underfitting. If the model has high bias and low variance then it is called underfitting. Suppose you
have five data points in the training dataset and you will apply linear regression. After applying linear
regression you will see that the best-fitted line only fits or touch one data if it happens then you can say
that the model will give a very bad prediction on the training dataset. But when a test dataset comes or new
data points come, also the same thing will happen. So in this case you will say that the dataset is
under-fitted. In underfitting, ml model gives bad predictions for both training and test datasets.
Reasons for Underfitting:
1. If bias is High and variance is low bias.
2. If The training dataset size is less or not enough.
3. If data is not cleaned.
4. If data contains noise in it.
Techniques to reduce underfitting:
1. Clean the data and remove the noise.
Increase model complexity
2. Performing feature engineering perfectly.
Overfitting means when ml model gives very good accuracy on training data and give very poor accuracy on test
data. If the model has low bias and high variance then it is called overfitting.
Suppose you have some data points and you will apply polynomial regression. After applying polynomial
regression the line is going over all the data points. It means the line is touching all the data points, so
it means that now the model will give 100% or near to 100% accuracy on those data points. But when you test
the model on a training dataset or when new data points come from the testing data set then the model can't do
a good prediction. It happens because now the model is overtrained and can't take new data points which are
out of the line or too much far from the line. For this reason, it will ignore all those data points which are
a little far. For this reason, the model does a very good prediction on the training dataset but do a very bad
prediction on the test dataset and this is called overfitting
Reasons for Overfitting:
1. If variance is High and bias is low.
2. If the size of training dataset less or not enough.
Techniques to reduce Overfitting:
1. Increase the size of the training data..
2. Reduce model complexity.
So the answer is when our model gives good accuracy and both training and test dataset. We can also say when our model has low bias and low variance, then we can say it is a good model. So we should train our model like it should not touch or fit every data point but should fit maximum numbers of data points.