Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Pandas Introduction

Pandas Series

Pandas DataFrame

Pandas Read Files

Pandas Some functions and properties

Pandas Math Function

Pandas Selection

Pandas Change Type

Pandas Concatenate & Split

Pandas Sorting

Pandas Filter

Pandas Data Cleaning

Pandas Group by

Pandas Time Series

Pandas Analysis 1

Pandas Analysis 2

Pandas Analysis 3

Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Read files using pandas

In pandas, you perform a lot of work with big data. Data can be in CSV, Excel, JSON, etc format, and also data can be stored in a different place. To work with data in pandas at first you have to get the data and read the data. After getting and reading you will be able to perform various types of work on the data.

How to read csv,excel,json,fwf,table file data?

To read or get data use the read_() function. In the bracket write the single or double quotation, in the quotation write the file full path then give a forward slash, and then the name of the file.file_extension.

Be careful about one thing and that is, change all the backward slash into forward slash of the file path.

For example to read CSV file write:
pd.read_csv("E:/my files/file name.extension").

Let's see how we can read csv, excel, json, text and table using pandas:

Example:
Input
import pandas as pd
df=pd.read_csv("D:/document/ml datasets/restaurant_bill.csv")
df=pd.read_excel("D:/document/ml datasets/restaurant_bill.xlsx")
df=pd.read_json("D:/document/ml datasets/restaurant_bill.json",lines=True)
df=pd.read_fwf("D:/document/ml datasets/restaurant_bill.txt")
df=pd.read_table("D:/document/ml datasets/restaurant_bill.txt")

How to read zip file?

To read zip files a module is used, named zipfile. Inside the zip file, if files are in CSV format then use read_csv, if excel then read_excel, and if other then other name. Pass an extra parameter inside read function and that is compression and the value for this parameter is: compression='zip'.

Input
import pandas as pd
import zipfile
df=pd.read_csv("D:/document/ ml datasets/my zip file.zip",compression='zip')
print(df)

How to read html table?

To read tables from HTML page read_html() function is used and inside the function give URL of that HTML page. In the HTML page, there can be multiple tables. This read_html() function will read all the tables and will return those tables as a list. So by using the index number you can get a single table from multiple tables. Suppose you get 5 tables on a HTML page. Now read_html() function will read all 5 tables and will show those as a list. Now by index numbers like 0,1,2,3,4, you can get a single table.

Input
import pandas as pd
df=pd.read_html("https://simple.wikipedia.org/wiki/List_of_U.S._states_by_population")
print(df)
df1=df[0]
print(df1)

How to read data from mysql database?

First, go to the SQL learning section. Learn about SQL, then come here. If you know how to work with mysql then continue.

Here we have to import sql from mysql.connector. This is used to connect you with the mysql so that you can get data from mysql. Here we need to pass some information's of mysql, like host name, user name, password, and database name. Here the table you will from the database that you have passed.

Input
import pandas as pd
import mysql.connector as sql

connection=sql.connect(host="localhost",user="rafsun001",password="mailsamefirst001",database="books_store")
query="select * from first_table"

df=pd.read_sql(query,connection)
df

header parameter of read function

Suppose your dataframe doesn't contain column names or dataframe contain column names but you want to make another row as column. In this case you will use header parameter. If you want to make another row as column names or header of the dataframe then write header parameter inside read function and pass the index number of that row. If your dataset doesn't contain any header or column names then pass header=None. If you do this then pandas default column names will be used. Here default column names if integers like 0, 1, 2, 4, etc.

Note:If there is no heading of the dataset then you have to use header=None. Because if we don't do this then the first row of the dataset will become heading of each column in the dataset

Input
import pandas as pd
df=pd.read_csv("D:/document/ ml datasets/restaurant_bill.csv",header=3)
print(df)

df2=pd.read_csv("D:/document/ ml datasets/restaurant_bill.csv",header=None)
print(df2)

Suppose you have a dataframe where you don't have any header or column name. Now you want to add a header or assign a name to each column.

To do this first use header parameter while reading the file and use the parameter value as None.. By doing this you are saying that my dataframe has no header. Now you can add header. To add header or column name to the dataframe, the columns function is used. In the function, you have to write the column names. But be careful about one thing and that is, the number of names must be equal to the number of columns present in the dataset.

Let's see the data:

0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02

Input
import pandas as pd
df=pd.read_csv("D:/document/ ml datasets/restaurant_bill.csv",header=None)
df.columns = ['Id', 'Name', 'Group name', 'Total marks',"Grade","Ranking"]
print(df)

	Id	Name	Group name	Total marks	Grade	Ranking
0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02

How to rename columns name?

To rename a column existing name, rename() function is used.

Let's see the data:

	Id	Name	Group_name	Total_marks	Grade	Ranking
0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02

Input
import pandas as pd
df=pd.read_csv("D:/document/ ml datasets/restaurant_bill.csv")
df.rename(columns={"Group_name":"Group Name","Total_marks":"Marks"},inplace=True)
print(df)

	Id	Name	Group Name	Marks	Grade	Ranking
0	01	A	Science	700	A+	01
1	02	B	Commerce	618	B+	02
2	03	A	Science	700	A+	01
3	04	D	Arts	687	A+	01
4	05	E	Commerce	611	B+	02

usecols parameter of read function in pandas

In usecols parameter pass columns index number as a list. Read function will only print those columns from the dataset which index number is given in the usecols parameter while reading.

Input
import pandas as pd
df=pd.read_csv("D:/document/ ml datasets/restaurant_bill.csv",usecols=[1,2,3])
print(df)

skiprows parameter of read function in pandas

In skiprows parameter pass rows index number as a list. Skip rows function will delete or will not print those rows while reading.

Input
import pandas as pd
df=pd.read_csv("D:/document/ ml datasets/restaurant_bill.csv",skiprows=[1,2,3])
print(df)

set_option() function

Sometimes you have a lot of rows and columns and when you try to print the dataset, you can't see all the rows and columns. To see all the rows and columns use the set_option function.

Example 1:

Input
import pandas as pd
df=pd.read_csv("D:/document/ ml datasets/restaurant_bill.csv")
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)
print(df)

Example 2:

Input
import pandas as pd
df=pd.read_csv("D:/document/ ml datasets/restaurant_bill.csv")
pd.set_option("display.max_columns",19)
pd.set_option("display.max_rows",19)
print(df)

In example 2, only 19 columns will be displayed.

How to save dataset in pc or convert dataset from one format to another format?

Code:
dataFrame_var.to_fileFormat("file name.file_format_extension").

Here dataFrame_var means that variable where you have stored the dataframe after read. fileFormat means in which format you want to save your file. In the bracket pass a name to the new saved file after that give the extension according to the selected file format. For excel use to_excel, for CSV use to_csv function, same way if JSON then use to_json.
The file will save to the jupyter notebook directory.

Input
import pandas as pd
df.to_excel("excel save data.xlsx")
df.to_csv("csv save data.csv")
df.to_csv("json save data.json")

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us