Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math




Read data

Data preprocessing

Data cleaning

Handle date-time column

Handling outliers



Feature selection filter methods

Feature selection wrapper methods


Data split

Feature scaling

Supervised Learning



Bias and Variance

Overfitting and Underfitting


Ensemble learning

Unsupervised Learning


Association Rule


Model evaluation

Cross Validation

Parameter tuning

Code Exercise

Car Price Prediction

Flight Fare Prediction

Diabetes Prediction

Spam Mail Prediction

Fake News Prediction

Boston House Price Prediction

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Learn about Underfitting and Overfitting in machine learning


Underfitting means when ml model can't fit or touch enough/much/maximum data points on the best-fitted line. For this reason, ml model gives bad or very bad predictions on both training and test dataset and this is called underfitting. If the model has high bias and low variance then it is called underfitting. Suppose you have five data points in the training dataset and you will apply linear regression. After applying linear regression you will see that the best-fitted line only fits or touch one data if it happens then you can say that the model will give a very bad prediction on the training dataset. But when a test dataset comes or new data points come, also the same thing will happen. So in this case you will say that the dataset is under-fitted. In underfitting, ml model gives bad predictions for both training and test datasets.

Reasons for Underfitting:
1. If bias is High and variance is low bias.
2. If The training dataset size is less or not enough.
3. If data is not cleaned.
4. If data contains noise in it.

Techniques to reduce underfitting:
1. Clean the data and remove the noise.
Increase model complexity
2. Performing feature engineering perfectly.


Overfitting means when ml model gives very good accuracy on training data and give very poor accuracy on test data. If the model has low bias and high variance then it is called overfitting.
Suppose you have some data points and you will apply polynomial regression. After applying polynomial regression the line is going over all the data points. It means the line is touching all the data points, so it means that now the model will give 100% or near to 100% accuracy on those data points. But when you test the model on a training dataset or when new data points come from the testing data set then the model can't do a good prediction. It happens because now the model is overtrained and can't take new data points which are out of the line or too much far from the line. For this reason, it will ignore all those data points which are a little far. For this reason, the model does a very good prediction on the training dataset but do a very bad prediction on the test dataset and this is called overfitting

Reasons for Overfitting:
1. If variance is High and bias is low.
2. If the size of training dataset less or not enough.

Techniques to reduce Overfitting:
1. Increase the size of the training data..
2. Reduce model complexity.

Now there can be a question that is when we can say that our model will be a good model?

So the answer is when our model gives good accuracy and both training and test dataset. We can also say when our model has low bias and low variance, then we can say it is a good model. So we should train our model like it should not touch or fit every data point but should fit maximum numbers of data points.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us

© Copyright All rights reserved www.CodersAim.com. Developed by CodersAim.