Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

introduction

Setup

Read data

Data preprocessing

Data cleaning

Handle date-time column

Handling outliers

Encoding

Feature_Engineering

Feature selection filter methods

Feature selection wrapper methods

Multicollinearity

Data split

Feature scaling

Supervised Learning

Regression

Classification

Bias and Variance

Overfitting and Underfitting

Regularization

Ensemble learning

Unsupervised Learning

Clustering

Association Rule

Common

Model evaluation

Cross Validation

Parameter tuning

Code Exercise

Car Price Prediction

Flight Fare Prediction

Diabetes Prediction

Spam Mail Prediction

Fake News Prediction

Boston House Price Prediction

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Machine learning exercise 1: Car Price Prediction

Dataset Link

Importing Libraries

#Basic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn import metrics

Getting the data

df = pd.read_csv("D:/car data.csv")
df.head()

Gathering some information about the data

df.shape

df.describe()

df.info()

Let's create a dataset using the number of missing value present in the dataset and do sorting

x=[]
z=[]
for i in df:
    v=df[i].isnull().sum()/df.shape[0]*100
    x.append(v)
    z.append(i)
q={"Feature Name":z,"Percentage of missing values":x,} missingPercentageDataset=pd.DataFrame(q).sort_values(by="Percentage of missing values",ascending=False) pd.set_option("display.max_rows",None)
missingPercentageDataset

print("Fuel type")
print(df["Fuel_Type"].value_counts())
print("================")

print("Seller_Type")
print(df["Seller_Type"].value_counts())
print("================")

print("Transmission")
print(df["Transmission"].value_counts())

Label encoding

df.replace({"Fuel_Type":{"Petrol":2,"Diesel":1,"CNG":0}},inplace=True)

df.replace({"Seller_Type":{"Dealer":1,"Individual":0}},inplace=True)

df.replace({"Transmission":{"Manual":1,"Automatic":0}},inplace=True)

df.head()

Scaling the data

independent_features=df.drop(columns=["Car_Name","Selling_Price"], axis=1)

scaler= StandardScaler()
transform_scale_data=scaler.fit_transform(independent_features)
transform_scale_data

Separating the data and label

X=transform_scale_data
Y=df["Selling_Price"]

X

Splitting data into train and test data

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2, random_state=1)

Train the model using linear regression algorithm

model= LinearRegression()
model.fit(X_train,Y_train)

Model evaluation

We can't use accuracy score for regression problem because all the values are numerical values. Here we will check or count the correctly predicted values by the model and then we will do comparison with the original value using r square, mean absolute error, etc error method. For regression problem these error methods are used and for classification problem this methods are used

Train data evaluation

training_data_prediction=model.predict(X_train)
training_data_prediction

Visualization of the actual value and training data predicted value

plt.scatter(Y_train,training_data_prediction)
plt.xlabel("actual value")
plt.ylabel("Predicted value")
plt.title("Actual value vs Predicted value")
plt.show()

R squared error training data

In training_data_prediction we have the prediction of our model and in Y_train we have the actual values of Price column. Let's use r2_score to see the error. By calculating the error we will be able to see that how much accurate result our model is giving.

score_using_r2_score=metrics.r2_score(Y_train,training_data_prediction)
score_using_r2_score

Mean absolute error for training data

In training_data_prediction we have the prediction of our model and in Y_train we have the actual values of Price column. Let's use mean_absolute_error to see the error. By calculating the error we will be able to see that how much accurate result our model is giving.

score_using_mean_absolute_error=metrics.mean_absolute_error(Y_train,training_data_prediction)
score_using_mean_absolute_error

Prediction on the testing data

testing_data_prediction=model.predict(X_test)
testing_data_prediction

Visualization of the actual value and testing data predicted value

plt.scatter(Y_test,testing_data_prediction)
plt.xlabel("actual value")
plt.ylabel("Predicted value")
plt.title("Actual value vs Predicted value")
plt.show()

R squared error testing data

score_using_r2_score=metrics.r2_score(Y_test,testing_data_prediction)
score_using_r2_score

Mean absolute error for testing data

score_using_mean_absolute_error=metrics.mean_absolute_error(Y_test,testing_data_prediction)
score_using_mean_absolute_error

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us