Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Pandas Introduction
Pandas Series
Pandas DataFrame
Pandas Read Files
Pandas Some functions and properties
Pandas Math Function
Pandas Selection
Pandas Change Type
Pandas Concatenate & Split
Pandas Sorting
Pandas Filter
Pandas Data Cleaning
Pandas Group by
Pandas Time Series
Pandas Analysis 1
Pandas Analysis 2
Pandas Analysis 3
Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
Let's see the dataset. This dataset is will be used in the upcoming example of code.
Id | Name | Group_name | Total_marks | Grade | Ranking | |
---|---|---|---|---|---|---|
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
5 | 06 | F | Arts | 599 | C+ | 03 |
6 | 07 | P | Science | 575 | C+ | 03 |
7 | 08 | F | Arts | 600 | C | 03 |
8 | 09 | I | Commerce | 550 | C+ | 03 |
9 | 10 | J | Science | 650 | A+ | 01 |
10 | 11 | K | Arts | 680 | A+ | 01 |
11 | 12 | L | Science | 570 | C+ | 03 |
12 | 13 | M | Arts | 599 | C+ | 03 |
13 | 14 | N | Commerce | 597 | C+ | 03 |
14 | 15 | O | Science | 697 | A+ | 01 |
15 | 16 | B | Arts | 570 | C+ | 03 |
16 | 17 | D | Science | 588 | C+ | 03 |
17 | 18 | E | Science | 687 | A+ | 01 |
18 | 19 | C | Commerce | 688 | A+ | 01 |
19 | 20 | P | Arts | 588 | C+ | 03 |
20 | 21 | C | Science | 619 | B+ | 02 |
21 | 22 | M | Commerce | 600 | B+ | 02 |
22 | 23 | P | Arts | 700 | A+ | 01 |
Head() function is used to print some top rows of the dataset. By default you will get 5 rows, but you can change the number by giving values in the bracket. You can pass number between 1-50.
Id | Name | Group_name | Total_marks | Grade | Ranking | |
---|---|---|---|---|---|---|
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
Let's print top 7 rows using head function.
Id | Name | Group_name | Total_marks | Grade | Ranking | |
---|---|---|---|---|---|---|
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
5 | 06 | F | Arts | 599 | C+ | 03 |
6 | 07 | P | Science | 575 | C+ | 03 |
The tail() function is used to print some bottom rows of the dataset. By default you will get 5 rows, but you can change the number by giving values in the bracket. You can pass number between 1-50.
Id | Name | Group_name | Total_marks | Grade | Ranking | |
---|---|---|---|---|---|---|
18 | 19 | C | Commerce | 688 | A+ | 01 |
19 | 20 | P | Arts | 588 | C+ | 03 |
20 | 21 | C | Science | 619 | B+ | 02 |
21 | 22 | M | Commerce | 600 | B+ | 02 |
22 | 23 | P | Arts | 700 | A+ | 01 |
Let's print bottom 7 rows using head function.
Id | Name | Group_name | Total_marks | Grade | Ranking | |
---|---|---|---|---|---|---|
16 | 17 | D | Science | 588 | C+ | 03 |
17 | 18 | E | Science | 687 | A+ | 01 |
18 | 19 | C | Commerce | 688 | A+ | 01 |
19 | 20 | P | Arts | 588 | C+ | 03 |
20 | 21 | C | Science | 619 | B+ | 02 |
21 | 22 | M | Commerce | 600 | B+ | 02 |
22 | 23 | P | Arts | 700 | A+ | 01 |
Columns property is used to print all the column names.
The shape property is used to find that how many rows and columns you have in the dataset.
Describe() function is used to get a short description for each columns of the dataset like max,min,std,count,mean etc. This function only works on numerical columns.
Id | Total_marks | Ranking | |
---|---|---|---|
count | 23.00000 | 23.000000 | 23.000000 |
mean | 12.00000 | 629.260870 | 2.043478 |
std | 6.78233 | 51.136375 | 0.928256 |
min | 1.00000 | 550.000000 | 1.000000 |
25% | 6.50000 | 592.500000 | 1.000000 |
50% | 12.00000 | 611.000000 | 2.000000 |
70% | 17.50000 | 687.000000 | 3.000000 |
max | 23.00000 | 700.000000 | 3.000000 |
IsNull() function is used to see is there any nan or missing value present in the dataset. It will show boolean values as a result. False mean no nan value and True means there is nan value.
Info() function is used to get some information about the dataset like how many values are contained by each column, column data type etc.
# | Column | Non-Null Count | Dtype |
---|---|---|---|
0 | Id | 23 non-null | int64 |
1 | Name | 23 non-null | object |
2 | Group_name | 23 non-null | object |
3 | Total_marks | 23 non-null | int64 |
4 | Grade | 23 non-null | object |
5 | Ranking | 23 non-null | int64 |
dtypes: int64(3), object(3)
memory usage: 1.2+ KB
None
The unique () function is used to get the unique values of a column.
This property will show the data type of columns.
This function is used to set the caption of the data set
This function will show the largest values of a column. In the bracket, you can pass the number that how many numbers of largest value you want to see, but by default it will show 5 top largest values.
This function will show the smallest values of a column. In the bracket,you can pass the number that how many numbers of smallest value you want to see, but by default, it will show 5 top smallest value
By default, you get an index value but if you want to change the index value and want to make another column from the dataset as index column then you can use the set_index() function. Here if you write a column name in the bracket of the function then this function will make that column as the index column of the dataset.
Id | Name | Group_name | Total_marks | Grade | |
---|---|---|---|---|---|
Ranking | |||||
01 | 01 | A | Science | 700 | A+ |
02 | 02 | B | Commerce | 618 | B+ |
01 | 03 | A | Science | 700 | A+ |
01 | 04 | D | Arts | 687 | A+ |
02 | 05 | E | Commerce | 611 | B+ |
By using the index property you can get all the details(like start index, end index, step size) of the index column.
Suppose you index like 0,1,2,3,5 and you drop row number 2 and your index number become 0,1,3,4,5 but you want index number 0,1,2,3,4. To do this you have to reset your index number. To this you have to use reset_index() function and in the bracket you how pass drop=True.
This function will show that how many unique values are contained by each column of the dataset.
Value_counts() return the counts of the unique values of a column of a dataset in a sorted order.
There is a parameter named normalize which can have two values True or False(default). This function returns relative frequencies. If you do the sum of all the results, you will see that the total is 100.
you can also change the order of the output by using ascending or descending parameters.
This function is used to rename the columns name. Inside this function use a parameter named columns and pass those columns name which name you want to change and the new names for those columns as a dictionary.
Id | Name | Group_name | Final_marks | Grade | Ranking | |
---|---|---|---|---|---|---|
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
Id | Name | Group_name | Final_marks | Grade | Rank | |
---|---|---|---|---|---|---|
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
This function is used to check values. It means is a column contains a specific value or not. This function will return true in that position of the column where it gets the value and other cells will be false.
This function is used to see that how many duplicates value are presented in a column or dataset.
This function is used to remove duplicate values from a column.
To this drop function is used. If you want to drop column then you have to pass the column name and for row
pass the row index number. There is parameter name axis. If you want to drop column then you have to pass
axis=1 and for row axis=0.
Drop function create a copy and the perform drop operation. So the main dataset in not get affected. But if
you want to changes on the main data then you to pass another parameter name inplace=True.
This function is used to group data based on one or more columns. It's creates a group object on which we can apply various mathmatical functions like sum, mean, median, mode, etc. We can also use groupby with aggregate function.