Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Statistics Introduction

Statistics Variable

Statistics Sample & population

Statistics Measure of central tendency

Statistics Measure of Dispersion

Statistics Distribution

Statistics Z-score

Statistics PDF & CDF

Statistics Center Limit Theorem

Statistics Correlation

Statistics P value & hypothesis test

Statistics Counting rule

Statistics Outlier

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Outlier in statistics, data science & data analysis

What is outlier?

An outlier is an extremely high/big or low/small value compared to other values.

For example: You have numbers like 11, 15, 22, 34, 20,31, 29, 100, -150. Here you can see that 11, 15, 20, 22, 29, 31 and 34 are quite near.If you compare the distance or difference of these values with each other then you will see that there is not that much difference. So these values are not outliers. There is another value of 100 and -150. Now if you compare 100 with other values then you will see 100 is very big/high from other values and -150 is very small/low than other values means 100 and -150 are so far from other values. So you can say that 100 and -150 both are outliers.

Removing Outliers using IQR

IQR stands for inter quartile range. You have already learned about quartiles. Let's use that to find outliers.
But take one thing in mind:
A data value less than Q1-1.5(IQR) or greater than Q3+1.5(IQR) can be considered as outlier.

Steps of removing outliers using IQR
Step 1:
Arrange the data in order from lowest to highest and find Q1 and Q2

Step 2:
Find the interquartile range= Q3-Q1

Step 3:
Multiply IQR by 1.5

Step 4:
Subtract step 3 from Q1 and Q3

Step 5:
Check the data set for any data value that is smaller than Q1-1.5(IQR) or larger than Q3+1.5(IQR)


Example
Find the outliers from the following data set.
11,13,27,35,29,5,70,33

Arrange the data in ascending order:
5,11,13,27,29,33,35,70

Get Q1 and Q3
Q1=(11+13)/2=12
Q3=(33+35)/2=34

Interquartile range=Q3-Q1=34-12=22

Multiply IQR by 1.5
22*1.5=33 Subtract IQR from Q1 and Q3
12-33=-21
34+33=67

Check in the dataset that if there is any value smaller than -21 and a greater value than 67. The smaller values will consider as an outlier and the greater value will consider as the outlier.
There is no smaller value than -21 but have a greater value than 67 and that is 70. So 70 is an outlier.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us

© Copyright All rights reserved www.CodersAim.com. Developed by CodersAim.