Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Statistics Introduction

Statistics Variable

Statistics Sample & population

Statistics Measure of central tendency

Statistics Measure of Dispersion

Statistics Distribution

Statistics Z-score

Statistics PDF & CDF

Statistics Center Limit Theorem

Statistics Correlation

Statistics P value & hypothesis test

Statistics Counting rule

Statistics Outlier

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Measure of dispersion in | statistics, data science, data analysis, ml, dl.

What is Measure of dispersion?

Dispersion is the state of getting spread. The measure of dispersion allows understanding the distribution of the data. This technique is used to measure the spread of the data from a selected(mean/median/mode) point. It helps to calculate or find that how much the data are spread.

What is range and how to calculate?

It is the difference between the smallest and largest value of the data.
Formula: Range=Largest value - smallest value

Example:4,6,7,10,15
Range=15-4=11

Quartiles

Here data is divided into four subgroups or parts.
Q1=25%.It is equal to 25th percentile. Below 25% data lies here.
Q2=50%.It is equal to 50th percentile. From 25% and below 50% data lies here.
Q3=75%.It is equal to 75th percentile. From 50% and below 75% data lies here.
Q4=100%.It is equal to 100th percentile. From 75% and below 100% data lies here.

To find quartiles at first sort the data into ascending order.

Example:
Data: 5,10,15,20,25,30,35,40
Q1,
i=(25*8)/100=2, Q1=(10+15)/2=12.5
First quartile 25%, that's why you divide 25/100.Here You have total 8 values. So you wrote (25*8)/100 and you got the output 2.
Because the output is 2, so you will take the second position and the next position value and divide it by 2. In the example, you have 10 in the second position and 15 in the next position.

Q2,
i=(50*8)/100=4, Q2=(20+25)/2=22.5

Here you wrote 50 because Q2 means 50%. The output of i is 4. So you will take the fourth position value and the next value. Here fourth position value is 20 and then the next value is 25.

Q3,
i=(75*8)/100=6, Q3=(30+35)/2=32.5

Here you wrote 70 because Q2 means 75%. The output of i is 6. So you will take the sixth position value and the next value. Here sixth position value is 30 and then the next value is 35.

Interquartile range

Range of values between first and third quartiles.
Data must be sorted in ascending order. At first, find the mid-value of the data. Then divide the whole data into two parts according to mid-value. Then individually find the mid-value for each part. The first part mid-value is Q1 and the second part mid-value is Q3.

Formula: Interquartile range=Q3-Q1

Example: 2,4,6,7,10,11,12,14,16
Median=10
First part = 2,4,6,7
Second part=11,12,14,16
Median of first part,
Q1=(4+6)/2=5
Median of second part,
Q2=(12+14)/2=13
Interquartile range=Q3-Q1=13-5=7

Variance

Variance is a value of the squared variation of a random variable from its mean value. Variance calculates that how far a set of numbers (random) are spread out from their mean value. The variance depends on the standard deviation of the data set. Variance is equal to the square of standard deviation. If the value of variance is more then the data is more scattered from its mean and if the value of variance is low then it is less scattered from the mean. That's why variance is also called the measure of spread of data from the mean. Variance is represented by using the σ2 symbol.

How to calculate variance?

Ex: 20,34,30,35,45,40,55

Step 1: find mean
( 20+34+30+35+45+40+55) / 7=37
Step 2: Now subtract mean by each value.
20-37=-17, 34-37=-3, 30-37=-7, 35-37=-2, 45-37=8, 40=37=3, 55, 37=18
Step 3: Now do square each value that you got after subtraction.
(-17)2=289, (-3)2=9, (-7)2=49, (-2)2=4, (8)2=64, (3)2=9, (-18)2=324
Step 4: Now do sum all those values which you got after doing square.
289+9+49+4+64+9+324=748
Step 5: Now divide the value of the sum by the total number of elements.
748/7=106.857

How to calculate variance using group data?

So first of all you need to find the boundaries and then frequency.


Boundaries(.5 less from First value and .5 more from second value) Frequency(f)(The total number of values present in boundary) Midpoint(m)(Summation of Boundaries values and then divide by 2) D(f*m) E(m^2/f)
3.5-8.5 2 6 12 72
8.5-13.5 3 11 33 363
13.5-18.5 1 16 16 256
18.5-23.5 3 21 63 1323
23.5-28.5 5 26 130 3380
28.5-33.5 4 36 124 3844
28.5-33.5 N=18 36 ∑ f*m=378 ∑ f*m2=9238

Formula:
σ2={n(∑fx2)-(∑fm)^2} / {n(n-1)} ={18(9238)-(378)2} / {18(18-1)} = 76.47

Standard deviation

It is the positive square root of the variance. We represent standard deviation by the σ symbol.
How to calculate standard deviation?

Ex: 20,34,30,35,45,40,55

Step 1: find mean
( 20+34+30+35+45+40+55) / 7=37
Step 2: Now subtract Mean by each value.
20-37=-17, 34-37=-3, 30-37=-7, 35-37=-2, 45-37=8, 40=37=3, 55, 37=18
Step 3: Now do square each value that we get after subtraction.
(-17)2=289, (-3)2=9, (-7)2=49, (-2)2=4, (8)2=64, (3)2=9, (-18)2=324
Step 4: Now do sum all those values which we get after doing square.
289+9+49+4+64+9+324=748
Step 5: Now divide the value of the sum by the total number of elements.
748/7=106.857
Step 6 : Now do square route.
Route of 106.857=10.38

How to calculate standard deviation using group data?

So first of all we need to find the boundaries and then frequency.


Boundaries(.5 less from First value and .5 more from second value) Frequency(f)(The total number of values present in boundary) Midpoint(m)(Summation of Boundaries values and then divide by 2) D(f*m) E(m^2/f)
3.5-8.5 2 6 12 72
8.5-13.5 3 11 33 363
13.5-18.5 1 16 16 256
18.5-23.5 3 21 63 1323
23.5-28.5 5 26 130 3380
28.5-33.5 4 36 124 3844
28.5-33.5 N=18 36 ∑ f*m=378 ∑ f*m2=9238

Formula:
Variance(σ2)={n(∑fx2)-(∑fm)2} / {n(n-1)} ={18(9238)-(378)2} / {18(18-1)} = 76.47
Standard deviation(σ)=√76.47 = 8.74

Mean absolute deviation

Mean absolute deviation shows the difference between data elements and the mean.
It calculates the means of a dataset and also average absolute difference between each data point of a dataset. MAD is used as a measure of variability or dispersion in a data set. To calculate, first calculate the mean of the data. Then subtract each element of the data by the mean value

Example: Data:4,6,7,8,10 Mean=(4+6+7+8+10)/5=7 Deviations from the mean: (4-7)=-3,(6-7)=-1,(7-7)=0,(7-8)=1,(7-10)=3
Here minus(-) sign means before the mean value. So you can say that if you go 3 steps backward from the mean value then you will get element 4 and if you go 3 steps forward then you will get element 10.

X X- µ |X- µ|
4 -3 +3
6 -1 +1
7 0 0
8 +1 +1
10 +3 +3
Sum=0 Sum=8

In the table X means the elements of the data and after finding the mean, subtract each element with the mean(X- µ). Now if you get the sum of all values of X- µ, you will see that is 0. Now by 0, you can't do anything. So you have to do the modulus of each value. After this, you will have all the positive values. Now if you do sum then you will get the result 8.

Now divide 8 by total number of elements
Formula mean absolute error= Sum of (|X- µ|)/total number of elements
MAD=8/5=1.6

How to calculate mean absolute error for discrete series?


X F FX
1 3 3
2 2 4
3 1 3
8 4 31
7 7 49
4 3 12
Sum=20 Sum=103

mean absolute error=(F*|X- µ|)/F=48.70/20=2.435

Coefficient of Variation

Coefficient of variation is used when you compare two or multiple different surveys or test results that have different units/measures/values. The coefficient of variation converts all the units into one unit (percentage). It also shows the degree of variation of a set of data points.
Suppose you are comparing two results of two different tests. If sample X has CV(Coefficient of variation) of 36% and sample Y has CV(Coefficient of variation) of 29%, then you will say that sample A has more variation, relative to its mean.

Formula:Cvar=(standard Deviation /Mean)*100%

Example:
The mean value of the number of sales of magazines over a 6-month period id 172, and the standard deviation is 10. The mean of the bonuses paid to employees is $6,000, and the standard deviation is $800. Compare the coefficient of variation.

Cvar for magazine= (10/172)*100% =5.81%
Cvar for employees=(850/6500)/100%=13.07%

The Coefficient of variation is larger for employees' bonuses compared to the sales of magazines, So the bonuses are more variable than sales.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us

© Copyright All rights reserved www.CodersAim.com. Developed by CodersAim.