Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Pandas Introduction

Pandas Series

Pandas DataFrame

Pandas Read Files

Pandas Some functions and properties

Pandas Math Function

Pandas Selection

Pandas Change Type

Pandas Concatenate & Split

Pandas Sorting

Pandas Filter

Pandas Data Cleaning

Pandas Group by

Pandas Time Series

Pandas Analysis 1

Pandas Analysis 2

Pandas Analysis 3

Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Split and concatenation in pandas

concat() function

Multiple datasets or columns in pandas can concatenate by using the concat() function. If the names of the columns are same then it will concat one after another. If the names aren't same then it will concat side by side. So

Example 1:

Input
import pandas as pd
x=pd.DataFrame({"Id":[1,2,3],"Name":["A","B","c"]})
y=pd.DataFrame({"Id":[1,2,3],"Name":["D","E","F"]})
z=pd.concat([x,y])
print(z)
Output
  Id   Name
0 1   A
1 2   B
2 3   c
  Id   Name
0 1   D
1 2   E
2 3   F
  Id   Name
0 1   A
1 2   B
2 3   c
0 1   D
1 2   E
2 3   F

In example 1 you can see that after concatenation the index numbers are repeated. If you don't want this then you have to pass another parameter and that is ignore_index. It has two values one is True(default) and the other is False. To stop repeating index use True.

Example 2:

Input
import pandas as pd
x=pd.DataFrame({"Id":[1,2,3],"Name":["A","B","c"]})
y=pd.DataFrame({"Id":[1,2,3],"Name":["D","E","F"]})
z=pd.concat([x,y],ignore_index=True)
print(x)
print(y)
print(z)
Output
  Id   Name
0 1   A
1 2   B
2 3   c
  Id   Name
0 1   D
1 2   E
2 3   F
  Id   Name
0 1   A
1 2   B
2 3   c
3 1   D
4 2   E
5 3   F

You can contact data frame, series, column vertically or horizontal. To do this use another parameter named axis and the value is 0 or 1. 0 means vertical and 1 means horizontal.

Example 3:

Input
import pandas as pd
x=pd.DataFrame({"Id":[1,2,3],"Name":["A","B","c"]})
y=pd.DataFrame({"Id":[1,2,3],"Name":["D","E","F"]})
z=pd.concat([x,y],ignore_index=True,axis=1)
v=pd.concat([x,y],ignore_index=True,axis=0)
print(x)
print(y)
print("for axis =1")
print(z)
print("for axis=0")
print(v)
Output
  Id   Name
0 1   A
1 2   B
2 3   c
  Id   Name
0 1   D
1 2   E
2 3   F

for axis =1
     0    1    2   3
0   1   A   1   D
1   2   B   2   E
2   3   c   3   F

for axis=0
    Id    Name
0   1   A
1   2   B
2   3   c
3   1   D
4   2   E
5   3   F

Let's see examples using datasets

Input
import pandas as pd
x=pd.read_csv("D:\\CSV Datasets for practice\\practice1.csv")
print(x)
Output
Id Name
0 01 A
1 02 B
2 03 C

Input
import pandas as pd
y=pd.read_csc("D:\\CSV Datasets for practice\\practice2.csv")
print(y)
Output
Id Name
0 01 D
1 02 E
2 03 F

Input
import pandas as pd
z=pd.concat([x,y])
print(z)
Output
Id Name
0 01 A
1 02 B
2 03 C
0 01 D
1 02 E
2 03 F

Example 4:

Input
import pandas as pd
df1=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
df2=pd.read_csv("D:\\ document\\datasets\\restaurant_bill.csv")
x=df1[["total_bill","size","day"]]
y=df2[["Name","Total marks","Grade"]]
z=pd.concat([x,y],ignore_index=True)
Print("First dataset")
print(df1.head())
Print("second dataset")
print(df2.head())
Print("columns from first dataset")
print(x.head())
Print("columns from second dataset")
print(y.head())
Print("The final output")
print(z.head())
Output

First dataset


total_bil sex time tip size day smoker
0 500 male lunch 30 2 sat no
1 648 male dinner 35 3 sat+ yes
2 75 female dinner 10 1 sat+ no
3 159 female dinner 12 4 sat+ yes
4 250 male lunch 40 3 sat+ no

second dataset

Id Name Group_name Total_marks Grade Ranking
0 01 A Science 700 A+ 01
1 02 B Commerce 618 B+ 02
2 03 A Science 700 A+ 01
3 04 D Arts 687 A+ 01
4 05 E Commerce 611 B+ 02

columns from first dataset

total_bil size day
0 500 2 sat
1 648 3 sat
2 75 1 sat
3 159 4 sat
4 250 3 sat

columns from second dataset

Name Total_marks Grade
0 A 700 A+
1 B 618 B+
2 A 700 A+
3 D 687 A+
4 E 611 B+

The final output

total_bil size day Name Total_marks Grade
0 500 2 sat nan nan nan
1 648 3 sat nan nan nan
2 75 1 sat nan nan nan
3 159 4 sat nan nan nan
4 250 3 sat nan nan nan

In example 5, because the column names are not same so it will not concat vertically, it will concat horizontally and those values which can't concat vertically will fill by the default nan value. Now if the column names are same then all the values will concat vertically.

How to marge columns or datasets?

You can merge multiple datasets or columns in pandas by using the merge() function.

Example 1:

Input
import pandas as pd
x=pd.DataFrame({"id":[1,2,3],"Name":["A","B","c"]})
y=pd.DataFrame({"Section":[1,2,3],"Year":[2020,2021,2022]})
z=pd.merge(x,y,right_index=True,left_index=True)
print(x)
print(y)
print(z)
Output
   id   Name
0   1      A
1   2      B
2   3     c
     Section   Year
  0     1       2020
  1     2       2021
  2     3       2022
    id   Name   Section   Year
0   1     A          1           2020
1   2     B          2           2021
2   3     c          3           2022

Let's see examples using datasets

Example 2:

Input
import pandas as pd
x=pd.read_csv("D:\\CSV Datasets for practice\\practice1.csv")
print(x)
Output
Id Name
0 01 A
1 02 B
2 03 C

y=pd.read_csc("D:\\CSV Datasets for practice\\practice2.csv")
print(y)
Output
Section Year
0 1 2020
1 2 2021
2 3 2022

z=pd.merge(x,y,left_index=True,right_index=True)
print(z)
Output
Id Name Section Year
0 01 A 1 2020
1 02 B 2 2021
2 03 C 3 2022

Let's see some other examples using datasets

Example 3:

Input
import pandas as pd
df1=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
df2=pd.read_csv("D:\\ document\\datasets\\restaurant_bill.csv")
x=df1[["total_bill","size","day"]]
y=df2[["Name","Total marks","Grade"]]
z=pd.merge(x,y,right_index=True,left_index=True)
print("First dataset")
print(df1.head())
Print("second dataset")
print(df2.head())
print("columns from first dataset")
print(x.head())
Print("columns from second dataset")
print(y.head())
Print("Final output")
print(z.head())
Output

First dataset:

total_bil sex time tip size day smoker
0 500 male lunch 30 2 sat no
1 648 male dinner 35 3 sat+ yes
2 75 female dinner 10 1 sat+ no
3 159 female dinner 12 4 sat+ yes
4 250 male lunch 40 3 sat+ no

Second dataset:

Id Name Group_name Total_marks Grade Ranking
0 01 A Science 700 A+ 01
1 02 B Commerce 618 B+ 02
2 03 A Science 700 A+ 01
3 04 D Arts 687 A+ 01
4 05 E Commerce 611 B+ 02

columns from first dataset

total_bil size day
0 500 2 sat
1 648 3 sat
2 75 1 sat
3 159 4 sat
4 250 3 sat

columns from second dataset

Name Total_marks Grade
0 A 700 A+
1 B 618 B+
2 A 700 A+
3 D 687 A+
4 E 611 B+

Final output

total_bil size day Name Total_marks Grade
0 500 2 sat A 700 A+
1 648 3 sat B 618 B+
2 75 1 sat A 700 A+
3 159 4 sat D 687 A+
4 250 3 sat E 611 B+

Example 4:

Input
import pandas as pd
x=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
y=pd.read_csv("D:\\ document\\datasets\\restaurant_bill.csv")
z=pd.merge(x,y,right_index=True,left_index=True)
print("First dataset")
print(x.head())
print("Second dataset")
print(y.head())
print("Final Output")
print(z.head())
Output

First dataset:

total_bil sex time tip size day smoker
0 500 male lunch 30 2 sat no
1 648 male dinner 35 3 sat+ yes
2 75 female dinner 10 1 sat+ no
3 159 female dinner 12 4 sat+ yes
4 250 male lunch 40 3 sat+ no

Second dataset:

Id Name Group_name Total_marks Grade Ranking
0 01 A Science 700 A+ 01
1 02 B Commerce 618 B+ 02
2 03 A Science 700 A+ 01
3 04 D Arts 687 A+ 01
4 05 E Commerce 611 B+ 02

Final Output:

3
Id Name Group_name Total_marks Grade Ranking total_bil sex time tip size day smoker
0 01 A Science 700 A+ 01 500 male lunch 30 2 sat no
1 02 B Commerce 618 B+ 02 648 male dinner 35 3 sat+ yes
2 03 A Science 700 A+ 01 75 female dinner 10 1 sat+ no
3 04 D Arts 687 A+ 01 159 female dinner 12 4 sat+ yes
4 05 E Commerce 611 B+ 02 250 male lunch 40 sat+ no

How to join datasets or columns?

Join function is work almost similar like merge.

Input
import pandas as pd
x=pd.DataFrame({"id":[1,2,3],"Name":["A","B","c"]})
y=pd.DataFrame({"Section":[1,2,3],"Year":[2020,2021,2022]})
z=x.join(y)
print(x)
print(y)
print(z)
Output
   id   Name
0   1      A
1   2      B
2   3     c
     Section   Year
  0     1       2020
  1     2       2021
  2     3       2022
    id   Name   Section   Year
0   1     A          1           2020
1   2     B          2           2021
2   3     c          3           2022

How add new column to the dataset?

Example 1:

Input
import pandas as pd
df=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
print(df.head())
df["new_str_column"]=df["Name"]+" "+df["Group_name"]
print(df.head())
df["new_str_column1"]=df["Name"]
print(df.head())
df[new_name_column]=df["Total marks"]+df["Ranking"]
Output
Id Name Group_name Total_marks Grade Ranking
0 01 A Science 700 A+ 01
1 02 B Commerce 618 B+ 02
2 03 A Science 700 A+ 01
3 04 D Arts 687 A+ 01
4 05 E Commerce 611 B+ 02



Id Name Group_name Total_marks Grade Ranking new_str_column
0 01 A Science 700 A+ 01 A Science
1 02 B Commerce 618 B+ 02 B Commerce
2 03 A Science 700 A+ 01 A Science
3 04 D Arts 687 A+ 01 D Arts
4 05 E Commerce 611 B+ 02 E Commerce



Id Name Group_name Total_marks Grade Ranking new_str_column new_str_column1
0 01 A Science 700 A+ 01 A Science A
1 02 B Commerce 618 B+ 02 B Commerce B
2 03 A Science 700 A+ 01 A Science A
3 04 D Arts 687 A+ 01 D Arts D
4 05 E Commerce 611 B+ 02 E Commerce E



Id Name Group_name Total_marks Grade Ranking new_str_column new_str_column1 new_name_column
0 01 A Science 700 A+ 01 A Science A 7001.0
1 02 B Commerce 618 B+ 02 B Commerce B 6182.0
2 03 A Science 700 A+ 01 A Science A 7001.0
3 04 D Arts 687 A+ 01 D Arts D 6871.0
4 05 E Commerce 611 B+ 02 E Commerce E 6112.0

For numeric columns, you can't concatenate the numeric column. You can perform mathematical operation(+,-,*,/,** etc). In example 1 you can see that in ner_nume_column you have the summation result of Total marks and Ranking. If you need to concatenate multiple numeric columns then at first you have to convert those columns data type into str.

Example 2:

Input
import pandas as pd
df=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
print(df.head())
df["Total marks"]=df["Total marks"].astype(str)
df["Ranking"]=df["Ranking"].astype(str)
df["new_column"]=df["Total marks"]+"-/-"+df["Ranking"]
df.head()
Output
Id Name Group_name Total_marks Grade Ranking
0 01 A Science 700 A+ 01
1 02 B Commerce 618 B+ 02
2 03 A Science 700 A+ 01
3 04 D Arts 687 A+ 01
4 05 E Commerce 611 B+ 02



Id Name Group_name Total_marks Grade Ranking new_str_column new_column
0 01 A Science 700 A+ 01 A Science 700-/-1.0
1 02 B Commerce 618 B+ 02 B Commerce 618-/-2.0
2 03 A Science 700 A+ 01 A Science 700-/-1.0
3 04 D Arts 687 A+ 01 D Arts 687-/-1.0
4 05 E Commerce 611 B+ 02 E Commerce 611-/-2.0

The assign function also can be used to add a new column.

Example 3:

Input
import pandas as pd
df=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
print(df.head())
new_concat=df["Name"]+"-/-\- "+df["Grade"]
new_col=df.assign(New_column=new_concat)
new_col.head())
Output
Id Name Group_name Total_marks Grade Ranking
0 01 A Science 700 A+ 01
1 02 B Commerce 618 B+ 02
2 03 A Science 700 A+ 01
3 04 D Arts 687 A+ 01
4 05 E Commerce 611 B+ 02



Id Name Group_name Total_marks Grade Ranking New_column
0 01 A Science 700 A+ 01 A-/-\- A+
1 02 B Commerce 618 B+ 02 B-/-\- B+
2 03 A Science 700 A+ 01 A-/-\- A+
3 04 D Arts 687 A+ 01 D-/-\- A+
4 05 E Commerce 611 B+ 02 E-/-\- B+

Insert function can be used to add a new column. Here you can also define the position.

Input
import pandas as pd
df=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
print(df.head())
new_concat=df["Name"]+"-/-\- "+df["Grade"]
df.insert(0,"new_column",new_concat)
new_col.head())
Output
Id Name Group_name Total_marks Grade Ranking
0 01 A Science 700 A+ 01
1 02 B Commerce 618 B+ 02
2 03 A Science 700 A+ 01
3 04 D Arts 687 A+ 01
4 05 E Commerce 611 B+ 02



New_column Id Name Group_name Total_marks Grade Ranking
0 A-/-\- A+ 01 A Science 700 A+ 01
1 B-/-\- B+ 02 B Commerce 618 B+ 02
2 A-/-\- A+ 03 A Science 700 A+ 01
3 D-/-\- A+ 04 D Arts 687 A+ 01
4 E-/-\- B+ 05 E Commerce 611 B+ 02

How to delete a column to a dataset?

del keyword, pop() function and drop function is used to delete column.

Input
import pandas as pd
df=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
print(df.head())
del df["Name"]
print(df.head())
df.pop("Grade")
print(df.head())
df.drop("Total marks",axis=1)
print(df.head())
Output
Id Group_name Total_marks Grade Ranking
0 01 Science 700 A+ 01
1 02 Commerce 618 B+ 02
2 03 Science 700 A+ 01
3 04 Arts 687 A+ 01
4 05 Commerce 611 B+ 02



Id Group_name Total_marks Ranking
0 01 Science 700 01
1 02 Commerce 618 02
2 03 Science 700 01
3 04 Arts 687 01
4 05 Commerce 611 02



Id Group_name Ranking
0 01 Science 01
1 02 Commerce 02
2 03 Science 01
3 04 Arts 01
4 05 Commerce 02

How to split a column?

Input
import pandas as pd
df=pd.read_csv("D:\\ document\\datasets\\grade_dataset.csv")
print(df.head())
df[["New_name","New_group"]]=df["Combine"].str.split(" ",expand=True)
print(df.head())
Output
Id Name Group_name Total_marks Grade Ranking Combine
0 01 A Science 700 A+ 01 A Science
1 02 B Commerce 618 B+ 02 B Commerce
2 03 A Science 700 A+ 01 A Science
3 04 D Arts 687 A+ 01 D Arts
4 05 E Commerce 611 B+ 02 E Commerce



Id Name Group_name Total_marks Grade Ranking Combine New_name New_group
0 01 A Science 700 A+ 01 A Science A Science
1 02 B Commerce 618 B+ 02 B Commerce B Commerce
2 03 A Science 700 A+ 01 A Science A Science
3 04 D Arts 687 A+ 01 D Arts D Arts
4 05 E Commerce 611 B+ 02 E Commerce E Commerce

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us

© Copyright All rights reserved www.CodersAim.com. Developed by CodersAim.