Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Statistics Introduction
Statistics Variable
Statistics Sample & population
Statistics Measure of central tendency
Statistics Measure of Dispersion
Statistics Distribution
Statistics Z-score
Statistics PDF & CDF
Statistics Center Limit Theorem
Statistics Correlation
Statistics P value & hypothesis test
Statistics Counting rule
Statistics Outlier
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
An outlier is an extremely high/big or low/small value compared to other values.
For example: You have numbers like 11, 15, 22, 34, 20,31, 29, 100, -150. Here you can see that 11, 15, 20, 22,
29, 31 and 34 are quite near.If you compare the distance or difference of these values with each other then
you will see that there is not that much difference. So these values are not outliers. There is another value
of 100 and -150. Now if you compare 100 with other values then you will see 100 is very big/high from other
values and -150 is very small/low than other values means 100 and -150 are so far from other values. So you
can say that 100 and -150 both are outliers.
IQR stands for inter quartile range. You have already learned about quartiles. Let's use that to find
outliers.
But take one thing in mind:
A data value less than Q1-1.5(IQR) or greater than Q3+1.5(IQR) can be considered as
outlier.
Steps of removing outliers using IQR
Step 1:
Arrange the data in order from lowest to highest and find Q1 and Q2
Step 2:
Find the interquartile range= Q3-Q1
Step 3:
Multiply IQR by 1.5
Step 4:
Subtract step 3 from Q1 and Q3
Step 5:
Check the data set for any data value that is smaller than Q1-1.5(IQR) or larger than
Q3+1.5(IQR)
Example
Find the outliers from the following data set.
11,13,27,35,29,5,70,33
Arrange the data in ascending order:
5,11,13,27,29,33,35,70
Get Q1 and Q3
Q1=(11+13)/2=12
Q3=(33+35)/2=34
Interquartile range=Q3-Q1=34-12=22
Multiply IQR by 1.5
22*1.5=33 Subtract IQR from Q1 and Q3
12-33=-21
34+33=67
Check in the dataset that if there is any value smaller than -21 and a greater value than 67. The smaller
values will consider as an outlier and the greater value will consider as the outlier.
There is no smaller value than -21 but have a greater value than 67 and that is 70. So 70 is an outlier.