Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Statistics Introduction
Statistics Variable
Statistics Sample & population
Statistics Measure of central tendency
Statistics Measure of Dispersion
Statistics Distribution
Statistics Z-score
Statistics PDF & CDF
Statistics Center Limit Theorem
Statistics Correlation
Statistics P value & hypothesis test
Statistics Counting rule
Statistics Outlier
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
Class limit:
Formula: CL=( Largest number - Smallest number)/gap
Suppose you have the largest number 200 and smallest number 50, you want a gap of 5.
So
CL=(200-50)/5=30
Boundary:
Take .5 less from the first value and .5 more from the second value.
Tally:
How many numbers are present in the boundary.
Frequency:
How many numbers are present in the boundary.
Cumulative frequency:
In the example, you can see, first you write 4. Here 4 is the first value of frequency. Then in the second
cell of Cumulative frequency, you have 11 which is the sum of the Cumulative frequency column previous cell(4)
and frequency column second cell(7). This thing will go on until the end.
Example:
Records of ages of 54 people :54,34,45,56,67,81,45,23,34,56,
76,54,32,82,84,34,56,45,76,34,21,23,85,27,76,53,44,37,82,56,78,57,41,
83,48,46,58,59,63,64,76,58,42,41,35,65,45,78,70,60,50,40,35,50
Class Limit | Boundaries( Take .5 less from first value and .5 more from second value) | Tally | Frequency(f) | Cumulative frequency() |
---|---|---|---|---|
23-29 | 22.5-29-5 | |||| | 4 | 4 |
30-36 | 29.5-36.5 | ||||| || | 7 | 11 |
37-43 | 36.5-43.5 | |||| | 4 | 15 |
44-50 | 43.5-50.5 | ||||| ||||| | 10 | 25 |
51-57 | 50.5-57.5 | ||||| ||| | 8 | 33 |
58-64 | 57.5-64.5 | ||||| | | 6 | 39 |
65-71 | 64.5-71.5 | ||| | 3 | 42 |
72-78 | 71.5-78.5 | ||||| | | 6 | 48 |
79-85 | 78.5-85.5 | ||||| | | 6 | 54 |
It is a single value that represents the center point of a dataset. You can also say it as the central location of a dataset. It yields information about a particular location in a group of numbers. There are three common measures of central tendency: mean, median, and mode. Each of these measures, finds the central location of a dataset.
Mean is represented by the μ symbol. To find the mean, do the sum of all the given values and divide the
result of the sum by the total number of values.
Formula:μ=(X1+X2+...Xn)/n
Example:Suppose we have 5 numbers 1, 2, 3, 4, 5. What is the mean?
Ans:(1+2+3+4+5)/5=3
Boundaries(have to take .5 less from first value and .5 more from second value) | Frequency(number of values present in the boundaries) | Midpoint(find sum of first and second value of boundary and then divide it by 2 ) | Frequency*Midpoint(fm) |
---|---|---|---|
3.5-8.5 | 5 | 6 | 30 |
8.5-13.5 | 6 | 11 | 66 |
13.5-18.5 | 4 | 16 | 64 |
n=15(addition of all the values present in this column) | 16 | mean=160(addition of all the values present in this column) |
So, mean =fm/n=160/15=10.67
Sometimes you want to calculate the mean of numbers but you want to give more importance to some numbers than
other numbers. In that case, for those numbers which you want to give more importance, you assign weight to
those numbers. In this case, the mean you will get is called weighted mean.
weighted mean = (Σ wi xi)/Σ wi
Here,
xi=data point
wi=weight
Let's see an example:
Category | Weight | Scores |
---|---|---|
Quiz | 15 | 88 |
Class Test | 5 | 70 |
Mid exam | 25 | 87 |
Final exam | 30 | 99 |
mean={(15*88)+(5*70)+(25*87)+(30*99)}/(15+5+25+30)
=(1320+350+2175+2970)/75
=6815/75
=90
Sometimes you want to calculate an average of numbers but you want to give more importance to some numbers
than other numbers. In that case, for those numbers which you want to give more importance, you assign weight
to those numbers. In this case, the average you get is called the weighted average.
Formula: (x1w1+x2w2+x3w3+...+xNwN)/W
Here,
W=summation of all weights
w=weight assign to a particular value
Example:
Suppose jenny give a midterm exam and her score is 83 and her final term exam score is 95. Now calculate the
weighted average of Ana total score where use 40% weight for the midterm exam and 60% weight for the final
exam
You know that to get a good grade final exam is more important. That's why 40% is used for the midterm exam
and 60% weight is used for the final exam.
Now let's see the score,
Weighted average={(83*40%)+(95*60%)}/(40%+60%)=90.2
Median means that the center value of a list of numbers. Before finding the median you should arrange the data
in ascending order. If the count of all the numbers of the list is odd then take the center value as median
and if the count is even then do the sum of two center values and divide the sum by two and the result will be
the median.
Ex: 1,2,3,4,5
Ans:3
Ex:1,2,3,4,5,6
Ans:(3+4)/2=3.5
Median for group data:
Formula: Median=L+[{(N/2)-cfp}/fmed]*W
Here,
L=Lower limit of the median class
cfp=Cumulative frequency of class preceding the median class
fmed=Frequency of the median class
W=Width of the median class
N=Total of frequencies
EX:
Records of ages of 54 people :54,34,45,56,67,81,45,23,34,56,
76,54,32,82,84,34,56,45,76,34,21,23,85,27,76,53,44,37,82,56,78,57,41,
83,48,46,58,59,63,64,76,58,42,41,35,65,45,78,70,60,50,40,35,50
Class Limit | Boundaries | Frequency(f) | Cumulative frequency() |
---|---|---|---|
23-29 | 22.5-29-5 | 4 | 4 |
30-36 | 29.5-36.5 | 7 | 11 |
37-43 | 36.5-43.5 | 4 | 15 |
44-50 | 43.5-50.5 | 10 | 25 |
51-57 | 50.5-57.5 | 8 | 33 |
58-64 | 57.5-64.5 | 6 | 39 |
65-71 | 64.5-71.5 | 3 | 42 |
72-78 | 71.5-78.5 | 6 | 48 |
79-85 | 78.5-85.5 | 6 | 54 |
Here,
N=54
Now if you divide 54 by 2 and the result is 27. Now find where 27 is present in cumulative frequency. Here 27
is present in the 5th cell of cumulative class because the 4th cell is 25 where 27 can't come and the 6th cell
is 39 which is so far. So you can say that 27 is present in the 5th cell where the cell value is 33.
Here L=51 because in 5th row, class limit column lower limit is 51.
Here fmed=8 because 5th row frequency column cell value is.
Here cfp=25 because in Cumulative frequency column 5th cell previous cell value is 25.
Here W=6 because here the class limit gap/difference is 6.
Median=51+[{(54/2)-25}/8]*6=52.5
How to calculate mode?
Mode means that value which comes most of the time in the dataset. Before finding the mode we should arrange
the data in ascending order. This is not mandatory but a good practice.
Ex: A,B,A,C,D,B,A
Here we have A there times so A is mode.
Formula: Mode=Lmo+{d1/(d1+d2)}*W
W=class limit column values gap/difference
d1=Frequency column, largest number cell value - previous cell value
d2=Frequency column, largest number cell value - Next cell value
Lmo=Lower limit of class limit for frequency column largest value cell
EX:
Records of ages of 54 people :54,34,45,56,67,81,45,23,34,56,
76,54,32,82,84,34,56,45,76,34,21,23,85,27,76,53,44,37,82,56,78,57,41,
83,48,46,58,59,63,64,76,58,42,41,35,65,45,78,70,60,50,40,35,50
Class Limit | Boundaries | Frequency(f) |
---|---|---|
23-29 | 22.5-29-5 | 4 |
30-36 | 29.5-36.5 | 7 |
37-43 | 36.5-43.5 | 4 |
44-50 | 43.5-50.5 | 10 |
51-57 | 50.5-57.5 | 8 |
58-64 | 57.5-64.5 | 6 |
65-71 | 64.5-71.5 | 3 |
72-78 | 71.5-78.5 | 6 |
79-85 | 78.5-85.5 | 6 |
Here largest value of frequency column is 10.
So,
Lmo=44
d1=6
d2=2
W=6
Mode=44+{6/(6+2)}*6=48.5
If you have some numbers and you want to find a particular number position among those numbers then you will
use percentile. Percentiles divide a group of data into 100 parts.
For example, the 80th percentile indicates that at most 80% of the data lies below it and at least 20% of the
data lies above it. For 95th percentile indicates that at most 95% of the data lies below it and at least 5%
of the data lies above it.
Percentage and percentile are not same. For example, if a student gets 80 out of 100 then you can say that the
student gets an 80% number but you don't know the position for his class. Suppose percentile is 90th and
percentage is 80% of that student. It means that the student performed better than 90% in his class and
performed very well because he gets 80% marks.
Formula: Percentile={(number of values below X+0.5)/total number of
values}*100
Example:1
The score of 5 students are given below.
90,61,80,77,85
Find the percentile of 85.
Arrange in ascending order= 61,77,80,85,90
X=85.
Number of value below X=3
total number of value=5
So,
Percentile={(3+0.5)/5}*100=70th percentile
So you can say that student who did score 85, did better than 70%
Step 1:
Arrange the data in ascending order.
Step 2:
Put the value in the formula: c=(n*p)/100
Here,
n=total number of values
p=percentile
Step 3:
If the value of c is not a whole number then round up to the next number. Now find that number position in the
arranged data. That value will be the percentile value.
or,
If the value of c is a whole number then use the value halfway between the cth and (c+1)st value when counting
up from the lowest value.
Example:1
The marks of 5 students are given below.
90, 61, 80, 77, 85.
Calculate the value corresponding to 70th percentile.
Arrange in ascending order= 61, 77, 80, 85, 90
n=5
p=70th
c=(5*70)/100=3.5=4
Now go there where you arrange the data in ascending order. There get the 4th value(because c=4). Here 4th
value is 85. So the 70th percentile number is 85.
Example:2
The marks of 5 students are given below.
90, 60, 80, 70, 50
Calculate the value corresponding to 70th percentile.
Arrange in ascending order= 50,60,75,80,90
n=5
p=60th
c=(5*60)/100=3
Here c is whole number
In the arrange data 3rd position data is 70 and 4th position(c+1) data is 80.
So,
70+80=75
So to get 60th percentile we need 75 marks.
Decile divide the data into 10 groups and the groups are denoted by D1,
D2,D3,D4 and so on.
Formula: D=(k/10)*(n+1)
Here,
k=number of decile which you want to find.
n=number of observation
Example:
Find decile D2 and D4 of the following student.
33, 44, 82, 50, 70, 90, 45, 72
Arrange the data in ascending order: 33,44,45,50,70,72,82,90
for D2:
D=(2/10)*(8+1)=1.8=2
D2 is the 2nd element and that is 44
for D4:
D=(4/10)*(8+1)=3.6=4
D4 is the 2nd element and that is 50