Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Pandas Introduction
Pandas Series
Pandas DataFrame
Pandas Read Files
Pandas Some functions and properties
Pandas Math Function
Pandas Selection
Pandas Change Type
Pandas Concatenate & Split
Pandas Sorting
Pandas Filter
Pandas Data Cleaning
Pandas Group by
Pandas Time Series
Pandas Analysis 1
Pandas Analysis 2
Pandas Analysis 3
Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
Learn Machine learning
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
In pandas, you perform a lot of work with big data. Data can be in CSV, Excel, JSON, etc format, and also data can be stored in a different place. To work with data in pandas at first you have to get the data and read the data. After getting and reading you will be able to perform various types of work on the data.
To read or get data use the read_() function. In the bracket write the single or double quotation, in the
quotation write the file full path then give a forward slash, and then the name of the file.file_extension.
Be careful about one thing and that is, change all the backward slash into forward slash of the file path.
For example to read CSV file write:
pd.read_csv("E:/my files/file name.extension").
Let's see how we can read csv, excel, json, text and table using pandas:
To read zip files a module is used, named zipfile. Inside the zip file, if files are in CSV format then use read_csv, if excel then read_excel, and if other then other name. Pass an extra parameter inside read function and that is compression and the value for this parameter is: compression='zip'.
To read tables from HTML page read_html() function is used and inside the function give URL of that HTML page. In the HTML page, there can be multiple tables. This read_html() function will read all the tables and will return those tables as a list. So by using the index number you can get a single table from multiple tables. Suppose you get 5 tables on a HTML page. Now read_html() function will read all 5 tables and will show those as a list. Now by index numbers like 0,1,2,3,4, you can get a single table.
First, go to the SQL learning section. Learn about SQL, then come here. If you know how to work with mysql
then continue.
Here we have to import sql from mysql.connector. This is used to connect you with the mysql so that you can
get data from mysql. Here we need to pass some information's of mysql, like host name, user name, password,
and database name. Here the table you will from the database that you have passed.
Suppose your dataframe doesn't contain column names or dataframe contain column names but you want to make
another row as column. In this case you will use header parameter. If you want to make another row as column
names or header of the dataframe then write header parameter inside read function and pass the index number of
that row. If your dataset doesn't contain any header or column names then pass header=None. If you do this
then pandas default column names will be used. Here default column names if integers like 0, 1, 2, 4, etc.
Note:If there is no heading of the dataset then you have to use header=None. Because if we don't do this
then the first row of the dataset will become heading of each column in the dataset
Suppose you have a dataframe where you don't have any header or column name. Now you want to add a header or
assign a name to each column.
To do this first
use header parameter while reading the file and use the parameter value as None.. By doing this you are
saying that my dataframe has no header. Now you can add header. To add header or column name to the dataframe,
the columns function is used. In the function, you have to write the column names. But be careful about one
thing and that is, the number of names must be equal to the number of columns present in the dataset.
Let's see the data:
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
Id | Name | Group name | Total marks | Grade | Ranking | |
---|---|---|---|---|---|---|
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
To rename a column existing name, rename() function is used.
Let's see the data:
Id | Name | Group_name | Total_marks | Grade | Ranking | |
---|---|---|---|---|---|---|
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
Id | Name | Group Name | Marks | Grade | Ranking | |
---|---|---|---|---|---|---|
0 | 01 | A | Science | 700 | A+ | 01 |
1 | 02 | B | Commerce | 618 | B+ | 02 |
2 | 03 | A | Science | 700 | A+ | 01 |
3 | 04 | D | Arts | 687 | A+ | 01 |
4 | 05 | E | Commerce | 611 | B+ | 02 |
In usecols parameter pass columns index number as a list. Read function will only print those columns from the dataset which index number is given in the usecols parameter while reading.
In skiprows parameter pass rows index number as a list. Skip rows function will delete or will not print those rows while reading.
Sometimes you have a lot of rows and columns and when you try to print the dataset, you can't see all the rows and columns. To see all the rows and columns use the set_option function.
Code:
dataFrame_var.to_fileFormat("file name.file_format_extension").
Here dataFrame_var means that variable where you have stored the dataframe after read. fileFormat means in
which format you want to save your file. In the bracket pass a name to the new saved file after that give the
extension according to the selected file format. For excel use to_excel, for CSV use to_csv function, same way
if JSON then use to_json.
The file will save to the jupyter notebook directory.