Learn Python

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

introduction

Setup

Read data

Data preprocessing

Data cleaning

Handle date-time column

Handling outliers

Encoding

Feature_Engineering

Feature selection filter methods

Feature selection wrapper methods

Multicollinearity

Data split

Feature scaling

Supervised Learning

Regression

Classification

Bias and Variance

Overfitting and Underfitting

Regularization

Ensemble learning

Unsupervised Learning

Clustering

Association Rule

Common

Model evaluation

Cross Validation

Parameter tuning

Code Exercise

Car Price Prediction

Flight Fare Prediction

Diabetes Prediction

Spam Mail Prediction

Fake News Prediction

Boston House Price Prediction

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

Machine learning exercise 3: Diabetes Prediction

Dataset Link

Importing Libraries

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import
train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

Getting the data

df = pd.read_csv("D:/diabetes.csv")
df.head()

Let's gather some information about data

df.shape

df.describe()

df.info()

Let's create a dataset using the number of missing value present in the dataset and do sorting

x=[]
z=[]
for i in df:
    v=df[i].isnull().sum()/df.shape[0]*100
    x.append(v)
    z.append(i)
q={"Feature Name":z,"Percentage of missing values":x,} missingPercentageDataset=pd.DataFrame(q).sort_values(by="Percentage of missing values",ascending=False)
pd.set_option("display.max_rows",None)
missingPercentageDataset

Separating data and labels

x=df.drop(columns="Outcome",axis=1)

Data standardization

we have to perform standardization only on the independent variable

scaler= StandardScaler()
transform_scale_data=scaler.fit_transform(x)
transform_scale_data

Let's see the final X(all the independent variables) and Y(dependent) variable

Look there can be a question and that is we already took all independent variable on the x variable and dependent variable in y variable using the data set. So now here why we are doing the same thing?
Look the answer is very easy. We have to perform standardization in classification problem just on all the independent variables not on the dependent variable. So previously we just separate the dataset means took all the dependent variable in x and the dependent variable in the y. Then we perform standardization on the x variable. After performing standardization we will use these transformed independent variable for the training. So we have to take all these transformed independent variables in the x and the dependent variable in y.

X=transform_scale_data
Y=df["Outcome"]

Splitting data into train and test data

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,stratify=Y, random_state=51)
#stratify prameter is only used in classification problem
print(X.shape, X_train.shape, X_test.shape)

Training the model using support vector machine classifier

model= svm.SVC(kernel="linear")
model.fit(X_train, Y_train)

Model evaluation

For Train Data

X_train_prediction= model.predict(X_train)
training_data_accuracy=accuracy_score(X_train_prediction, Y_train)
print("accuracy on training data:",training_data_accuracy )

For Test Data

X_test_prediction= model.predict(X_test)
testing_data_accuracy=accuracy_score(X_test_prediction, Y_test)
print("accuracy on testing data:",testing_data_accuracy )

Let's do predictions

#getting the input
input_data=(8,183,64,0,0,23.3,0.672,32 )

#changing the input_data to a numpy array
input_data_as_numpy_array = np.asarray(input_data)

#reshape the np array as we are predicting for one instance
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)

#standardization of input data

'''
Look here we normally can't send the data to our model after input. Because we performed standardization on all the independent variables. So all the value of all the independent values are transformed or scaled. So after putting value for prediction we also have to perform standardization on the putted data by user to transform and scaled the data.

we putted StandardScaler function in scaler function while transforming all the independent variables. So here we will just use that scaler variable and here we don't need to feed the data. We have to just transform the data.
'''
std_data=scaler.transform(input_data_reshaped)

prediction=model.predict(std_data)

if (prediction[0]==0):
print("No diabetes")
else:
print("Have Diabetes")

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us