Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

introduction

Setup

Read data

Data preprocessing

Data cleaning

Handle date-time column

Handling outliers

Encoding

Feature_Engineering

Feature selection filter methods

Feature selection wrapper methods

Multicollinearity

Data split

Feature scaling

Supervised Learning

Regression

Classification

Bias and Variance

Overfitting and Underfitting

Regularization

Ensemble learning

Unsupervised Learning

Clustering

Association Rule

Common

Model evaluation

Cross Validation

Parameter tuning

Code Exercise

Car Price Prediction

Flight Fare Prediction

Diabetes Prediction

Spam Mail Prediction

Fake News Prediction

Boston House Price Prediction

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Machine learning exercise 6: Boston House Price Prediction

Importing Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn.datasets
from sklearn.model_selection import
train_test_split
from xgboost import XGBRegressor
from sklearn import metrics

Getting the data

df1 = sklearn.datasets.load_boston()
df=pd.DataFrame(df1.data,columns=df1.feature_names)
df

Add the target (price) feature to the dataframe

df["Price"]=df1.target
df.head()

Gathering some information's

df.shape

df.describe()

df.info()

Let's create a dataset using the number of missing value present in the dataset and do sorting

x=[]
z=[]
for i in df:
    v=df[i].isnull().sum()/df.shape[0]*100
    x.append(v)
    z.append(i)
q={"Feature Name":z,"Percentage of missing values":x,}
missingPercentageDataset=pd.DataFrame(q).sort_values(by="Percentage of missing values",ascending=False)
pd.set_option("display.max_rows",None)
missingPercentageDataset

Correlation between features

correlation_1= df.corr()

plt.figure(figsize=(10,10))
sns.heatmap(correlation_1,cbar=True, square=True, fmt=".1f",annot=True,annot_kws={"size":8},cmap="Blues")

Separating data and labels

X=df.drop(["Price"],axis=1)
Y=df["Price"]

Splitting data into train and test data

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2, random_state=1)

raining model using XGBoost Regressor

For XGBoost Regressor we will only fit the model. No need to transform the model after fitting.

model=XGBRegressor()
model.fit(X_train,Y_train)

We can't use accuracy score for regression problem because all the values are numerical values. Here we will check or count the correctly predicted values by the model and then we will do comparison with the original value using r square, mean absolute error, etc error method. For regression problem these error methods are used and for classification problem this methods are used

Prediction on training data

training_data_prediction=model.predict(X_train)
training_data_prediction

Visualization of the actual value and training data predicted value

plt.scatter(Y_train,training_data_prediction)
plt.xlabel("actual value")
plt.ylabel("Predicted value")
plt.title("Actual value vs Predicted value")
plt.show()

R squared error training data

n training_data_prediction we have the prediction of our model and in Y_train we have the actual values of Price column. Let's use r2_score to see the error. By calculating the error we will be able to see that how much accurate result our model is giving

score_using_r2_score=metrics.r2_score(Y_train,training_data_prediction)
score_using_r2_score

Mean absolute error for training data

In training_data_prediction we have the prediction of our model and in Y_train we have the actual values of Price column. Let's use mean_absolute_error to see the error. By calculating the error we will be able to see that how much accurate result our model is giving.

score_using_mean_absolute_error=metrics.mean_absolute_error(Y_train,training_data_prediction)
score_using_mean_absolute_error

Prediction on the testing data

testing_data_prediction=model.predict(X_test)
testing_data_prediction

Visualization of the actual value and testing data predicted value

plt.scatter(Y_test,testing_data_prediction)
plt.xlabel("actual value")
plt.ylabel("Predicted value")
plt.title("Actual value vs Predicted value")
plt.show()

R squared error testing data

score_using_r2_score=metrics.r2_score(Y_test,testing_data_prediction)
score_using_r2_score

Mean absolute error for testing data

score_using_mean_absolute_error=metrics.mean_absolute_error(Y_test,testing_data_prediction)
score_using_mean_absolute_error

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us

© Copyright All rights reserved www.CodersAim.com. Developed by CodersAim.