Machine learning exercise 1
Install keras
#!pip install tensorflow
#!pip install keras
Importing Libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import tensorflow
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from sklearn import metrics
from sklearn.metrics import confusion_matrix
Getting the data
df = pd.read_csv("D:/diabetes.csv")
df.head()
Let's create a dataset using the number of missing value present in the dataset and do sorting
x=[]
z=[]
for i in df:
v=df[i].isnull().sum()/df.shape[0]*100
x.append(v)
z.append(i)
q={"Feature Name":z,"Percentage of missing values":x,}
missingPercentageDataset=pd.DataFrame(q).sort_values(by="Percentage of missing values",ascending=False)
pd.set_option("display.max_rows",None)
missingPercentageDataset
Separating data and labels
x=df.drop(columns="Outcome",axis=1)
Data standardization
we have to perform standardization only on the independent variable
scaler= StandardScaler()
transform_scale_data=scaler.fit_transform(x)
transform_scale_data
Let's see the final X(all the independent variables) and Y(dependent) variable
Look there can be a question and that is we already took all independent variable on the x variable and
dependent variable in y variable using the data set. So now here why we are doing the same thing?
Look the answer is very easy. We have to perform standardization in classification problem just on all the
independent variables not on the dependent variable. So previously we just separate the dataset means took all
the dependent variable in x and the dependent variable in the y. Then we perform standardization on the x
variable. After performing standardization we will use these transformed independent variable for the
training. So we have to take all these transformed independent variables in the x and the dependent variable
in y.
X=transform_scale_data
Y=df["Outcome"]
Splitting data into train and test data
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,stratify=Y, random_state=51)
print(X.shape, X_train.shape, X_test.shape)
Creating ANN model
Initializing the model
ann=Sequential()
Adding the input layer and first hidden layer
ann.add(Dense(units=6, input_dim=11, kernel_initializer='he_normal',activation="relu", ))
ann.add(Dropout(rate = 0.1))
Adding the Second Hidden layer
ann.add(Dense(units=6, kernel_initializer='he_normal', activation="relu"))
ann.add(Dropout(rate = 0.1))
Adding the third Hidden layer
ann.add(Dense(units=6, kernel_initializer='he_normal', activation="relu"))
ann.add(Dropout(rate = 0.1))
Adding the output layer
ann.add(Dense(units=1,kernel_initializer = 'glorot_normal', activation="sigmoid"))
Compiling ANN
ann.compile(optimizer="adam",loss="binary_crossentropy",metrics=['accuracy'])
'''
Parameters:
1. optimizer= pass the name of the optimizer which you want to use
2. loss= pass the name of the loss function which you want to use
1. metrics= pass the name of the metrics which you want to use
'''
Fitting the the model
ann.fit(X_train,Y_train,validation_split=0.33,batch_size=12,epochs = 100)
Score and accuracy of train data
score, acc = ann.evaluate(X_train, Y_train, batch_size=10)
print('Train score:', score)
print('Train accuracy:', acc)
Score and accuracy of test data
score, acc = ann.evaluate(X_test, Y_test, batch_size=10)
print('Test score:', score)
print('Test accuracy:', acc)
Confusion Matrix
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
cm = confusion_matrix(Y_test, y_pred)
import seaborn as sns
import matplotlib.pyplot as plt
p = sns.heatmap(pd.DataFrame(cm), annot=True, cmap="YlGnBu" ,fmt='g')
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Classification report
from sklearn.metrics import classification_report
print(classification_report(Y_test,y_pred))
Prediction system
input_data=(0,1,0,1,0,3000,0.0,66.0,360.0,1.0,1 )
input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)
prediction=ann.predict(input_data_reshaped)
if (prediction[0]==0):
print("Loan status no")
else:
print("Loan status yes")