Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Pandas Introduction

Pandas Series

Pandas DataFrame

Pandas Read Files

Pandas Some functions and properties

Pandas Math Function

Pandas Selection

Pandas Change Type

Pandas Concatenate & Split

Pandas Sorting

Pandas Filter

Pandas Data Cleaning

Pandas Group by

Pandas Time Series

Pandas Analysis 1

Pandas Analysis 2

Pandas Analysis 3

Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Some built-in function of pandas

Let's see the dataset. This dataset is will be used in the upcoming example of code.

Input
import pandas as pd
df=pd.read_csv("practice.csv")
print(df)
Output

	Id	Name	Group_name	Total_marks	Grade	Ranking
0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02
5	06	F	Arts	599	C+	03
6	07	P	Science	575	C+	03
7	08	F	Arts	600	C	03
8	09	I	Commerce	550	C+	03
9	10	J	Science	650	A+	01
10	11	K	Arts	680	A+	01
11	12	L	Science	570	C+	03
12	13	M	Arts	599	C+	03
13	14	N	Commerce	597	C+	03
14	15	O	Science	697	A+	01
15	16	B	Arts	570	C+	03
16	17	D	Science	588	C+	03
17	18	E	Science	687	A+	01
18	19	C	Commerce	688	A+	01
19	20	P	Arts	588	C+	03
20	21	C	Science	619	B+	02
21	22	M	Commerce	600	B+	02
22	23	P	Arts	700	A+	01

head() function

Head() function is used to print some top rows of the dataset. By default you will get 5 rows, but you can change the number by giving values in the bracket. You can pass number between 1-50.

Example 1:

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.head()
print(df1)
Output

	Id	Name	Group_name	Total_marks	Grade	Ranking
0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02

Let's print top 7 rows using head function.

Example 2:

Input
import pandas as pd
df=pd.read_csv("practice.csv")
df1=df.head(7)
print(df1)
Output

	Id	Name	Group_name	Total_marks	Grade	Ranking
0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02
5	06	F	Arts	599	C+	03
6	07	P	Science	575	C+	03

tail() function

The tail() function is used to print some bottom rows of the dataset. By default you will get 5 rows, but you can change the number by giving values in the bracket. You can pass number between 1-50.

Example 1:

Input
import pandas as pd
df=pd.read_csv("practice.csv")
df1=df.tail()
print(df1)
Output

	Id	Name	Group_name	Total_marks	Grade	Ranking
18	19	C	Commerce	688	A+	01
19	20	P	Arts	588	C+	03
20	21	C	Science	619	B+	02
21	22	M	Commerce	600	B+	02
22	23	P	Arts	700	A+	01

Let's print bottom 7 rows using head function.

Example 2:

Input
import pandas as pd
df=pd.read_csv("practice.csv")
df1=df.tail(7)
print(df1)
Output

	Id	Name	Group_name	Total_marks	Grade	Ranking
16	17	D	Science	588	C+	03
17	18	E	Science	687	A+	01
18	19	C	Commerce	688	A+	01
19	20	P	Arts	588	C+	03
20	21	C	Science	619	B+	02
21	22	M	Commerce	600	B+	02
22	23	P	Arts	700	A+	01

column property

Columns property is used to print all the column names.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.columns
print(df1)
Output
Index(['Id', 'Name', 'Group_name', 'Total_marks', 'Grade', 'Ranking'], dtype='object')

shape property

The shape property is used to find that how many rows and columns you have in the dataset.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.shape
print(df1)
Output
(23, 6)

describe() function

Describe() function is used to get a short description for each columns of the dataset like max,min,std,count,mean etc. This function only works on numerical columns.

Input
import pandas as pd
df=pd.read_csv("practice.csv")
df1=df.describe()
print(df1)
Output

	Id	Total_marks	Ranking
count	23.00000	23.000000	23.000000
mean	12.00000	629.260870	2.043478
std	6.78233	51.136375	0.928256
min	1.00000	550.000000	1.000000
25%	6.50000	592.500000	1.000000
50%	12.00000	611.000000	2.000000
70%	17.50000	687.000000	3.000000
max	23.00000	700.000000	3.000000

isnull() function

IsNull() function is used to see is there any nan or missing value present in the dataset. It will show boolean values as a result. False mean no nan value and True means there is nan value.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.isnull()
print(df1)

How to see percentage of missing value contains by each column of the dataset?

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")

ms=df.isnull().sum()/df.shape[0]*100>=17
print(ms)

info() function

Info() function is used to get some information about the dataset like how many values are contained by each column, column data type etc.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.info()
print(df1)
Output
RangeIndex: 23 entries, 0 to 22
Data columns (total 6 columns):

#	Column	Non-Null Count	Dtype
0	Id	23 non-null	int64
1	Name	23 non-null	object
2	Group_name	23 non-null	object
3	Total_marks	23 non-null	int64
4	Grade	23 non-null	object
5	Ranking	23 non-null	int64

dtypes: int64(3), object(3)
memory usage: 1.2+ KB
None

unique() function

The unique () function is used to get the unique values of a column.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.Name.unique()
print(df1)
Output
['A' 'B' 'D' 'E' 'F' 'P' 'I' 'J' 'K' 'L' 'M' 'N' 'O' 'C']

dtypes property

This property will show the data type of columns.

Example 1:

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.dtypes
print(df1) Output
Id    int64
Name    object
Group_name    object
Total marks    int64
Grade    object
Ranking    int64
dtype: object

Example 2:

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df["Name"].dtypes
print(df1)
Output
object

style.set_caption property

This function is used to set the caption of the data set

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.style.set_caption("My dataset")
print(df1)

nlargest() function

This function will show the largest values of a column. In the bracket, you can pass the number that how many numbers of largest value you want to see, but by default it will show 5 top largest values.

Example:
Input
import pandas as pd
df=pd.read_csv("practice.csv")
df1=df["Total marks"].nlargest(6)
print(df1)
Output
0     700
2     700
22     700
14     697
18     688
3     687
Name: Total marks, dtype: int64

nsmallest() function

This function will show the smallest values of a column. In the bracket,you can pass the number that how many numbers of smallest value you want to see, but by default, it will show 5 top smallest value

Input
import pandas as pd
df=pd.read_csv("practice.csv")
df1=df["Total marks"].nsmallest(4)
print(df1)
Output
8     550
13     567
11     570
15     570
Name: Total marks, dtype: int64

set_index() function

By default, you get an index value but if you want to change the index value and want to make another column from the dataset as index column then you can use the set_index() function. Here if you write a column name in the bracket of the function then this function will make that column as the index column of the dataset.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df2=df.set_index("Ranking")
print(df2.head())
Output

	Id	Name	Group_name	Total_marks	Grade
Ranking
01	01	A	Science	700	A+
02	02	B	Commerce	618	B+
01	03	A	Science	700	A+
01	04	D	Arts	687	A+
02	05	E	Commerce	611	B+

index property

By using the index property you can get all the details(like start index, end index, step size) of the index column.

Input
import pandas as pd
df1=df.index
print(df1)

How to reset index?

Suppose you index like 0,1,2,3,5 and you drop row number 2 and your index number become 0,1,3,4,5 but you want index number 0,1,2,3,4. To do this you have to reset your index number. To this you have to use reset_index() function and in the bracket you how pass drop=True.

Input
import pandas as pd
df=df.reset_index(drop=True)
print(df)

nunique() function

This function will show that how many unique values are contained by each column of the dataset.

Input
import pandas as pd
df1=df.nunique()
print(df1)
Output
Id     23
Name     14
Group_name     3
Total marks     16
Grade    4
Ranking     3
Date     23
dtype: int64

Value_counts() function

Value_counts() return the counts of the unique values of a column of a dataset in a sorted order.

Example 1:

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.Grade.value_counts()
print(df1)
Output
A+   9
C+   9
B+   4
C     1
Name: Grade, dtype: int64

There is a parameter named normalize which can have two values True or False(default). This function returns relative frequencies. If you do the sum of all the results, you will see that the total is 100.

Example 2:

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.Grade.value_counts(normalize=True)
print(df1)
Output
A+   0.391304
C+   0.391304
B+   0.173913
C     0.043478
Name: Grade, dtype: float64

you can also change the order of the output by using ascending or descending parameters.

Example 3:

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.Grade.value_counts(ascending=True)
print(df1)
Output
C    1
B+   4
A+   9
C+   9
Name: Grade, dtype: int64

rename() function

This function is used to rename the columns name. Inside this function use a parameter named columns and pass those columns name which name you want to change and the new names for those columns as a dictionary.

Example 1:

Input
import pandas as pd
f=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df.rename(columns={"Total marks":"Final_marks"},inplace=True)
print(df.head())
Output

	Id	Name	Group_name	Final_marks	Grade	Ranking
0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02

Example 2:

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df.rename(columns={"Total marks":"Final_marks","Ranking":"Rank"},inplace=True)
print(df.head())
Output

	Id	Name	Group_name	Final_marks	Grade	Rank
0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02

str.contains property

This function is used to check values. It means is a column contains a specific value or not. This function will return true in that position of the column where it gets the value and other cells will be false.

Input
import pandas as pd
df=pd.read_csv("practice.csv")
df1=df.Group_name.str.contains("Science")
print(df1)
Output
0    True
1    False
2    True
3    False
4    False
5    False
6    True
7    False
8    False
9    True
10   False
11   True
12   False
13   False
14   True
15   False
16   True
17   True
18   False
19   False
20   True
21   False
22   False
Name: Group_name, dtype: bool

duplicated() function

This function is used to see that how many duplicates value are presented in a column or dataset.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.Group_name.duplicated().sum()
print(df1)
Output
20

drop_duplicates() function

This function is used to remove duplicate values from a column.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")
df1=df.Group_name.drop_duplicates()
print(df1)
Output
0   Science
1   Commerce
3   Arts
Name: Group_name, dtype: object

How to drop column, row or values from a dataset?

To this drop function is used. If you want to drop column then you have to pass the column name and for row pass the row index number. There is parameter name axis. If you want to drop column then you have to pass axis=1 and for row axis=0.

Drop function create a copy and the perform drop operation. So the main dataset in not get affected. But if you want to changes on the main data then you to pass another parameter name inplace=True.

Input
import pandas as pd
df=pd.read_csv("D:\\CSV Datasets for practice\\practice.csv")

# Drop the 'salary' column
df = df.drop('salary', axis=1)

# Drop multiple columns
df = df.drop(['salary','Name'], axis=1)

# Drop a single row
df = df.drop(4, axis=0)

# Drop multiple rows
df = df.drop([4,6,10], axis=0)

groupby() function

This function is used to group data based on one or more columns. It's creates a group object on which we can apply various mathmatical functions like sum, mean, median, mode, etc. We can also use groupby with aggregate function.

Input:
import pandas as pd # Create a sample DataFrame
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie', 'Alice'], 'age': [25, 30, 35, 27], 'salary': [50000, 60000, 70000, 55000]})

# Group the data by the 'name' column and calculate the mean salary for each group
mean_salary_by_name = df.groupby('name')['salary'].mean()

# Print the mean salary for each name
print(mean_salary_by_name)

Output: name
Alice 52500
Bob 60000
Charlie 70000
Name: salary, dtype: int64

Input:
import pandas as pd # Create a sample DataFrame
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie', 'Alice'], 'age': [25, 30, 35, 27], 'salary': [50000, 60000, 70000, 55000]})

# Group the data by the 'name' and 'age' columns and calculate the total salary for each group
total_salary_by_name_and_age = df.groupby(['name', 'age'])['salary'].sum()

print(total_salary_by_name_and_age)

Output:

name     age
Alice     25 50000
              27 55000
Bob       30 60000
Charlie 35 70000
Name: salary, dtype: int64

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us