5. AI AND MACHINE LEARNING VTU LAB | READ NOW
MACHINE LEARNING VTU LAB- NAÏVE BAYESIAN CLASSIFIER
Program 5. WRITE A PROGRAM TO IMPLEMENT THE NAÏVE BAYESIAN CLASSIFIER FOR A SAMPLE TRAINING DATA SET STORED AS A .CSV FILE. COMPUTE THE ACCURACY OF THE CLASSIFIER, CONSIDERING FEW TEST DATA SETS.
Program Code – lab5.py
# import necessary libarities import pandas as pd from sklearn import tree from sklearn.preprocessing import LabelEncoder from sklearn.naive_bayes import GaussianNB # load data from CSV data = pd.read_csv('tennisdata.csv') print("THe first 5 values of data is :\n",data.head()) # obtain Train data and Train output X = data.iloc[:,:-1] print("\nThe First 5 values of train data is\n",X.head()) y = data.iloc[:,-1] print("\nThe first 5 values of Train output is\n",y.head()) # Convert then in numbers le_outlook = LabelEncoder() X.Outlook = le_outlook.fit_transform(X.Outlook) le_Temperature = LabelEncoder() X.Temperature = le_Temperature.fit_transform(X.Temperature) le_Humidity = LabelEncoder() X.Humidity = le_Humidity.fit_transform(X.Humidity) le_Windy = LabelEncoder() X.Windy = le_Windy.fit_transform(X.Windy) print("\nNow the Train data is :\n",X.head()) le_PlayTennis = LabelEncoder() y = le_PlayTennis.fit_transform(y) print("\nNow the Train output is\n",y) from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20) classifier = GaussianNB() classifier.fit(X_train,y_train) from sklearn.metrics import accuracy_score print("Accuracy is:",accuracy_score(classifier.predict(X_test),y_test))
MACHINE LEARNING Program Execution – lab5.ipynb
Jupyter Notebook program execution.
# import necessary libarities import pandas as pd from sklearn import tree from sklearn.preprocessing import LabelEncoder from sklearn.naive_bayes import GaussianNB # load data from CSV data = pd.read_csv('tennisdata.csv') print("THe first 5 values of data is :\n",data.head())
THe first 5 values of data is :
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes
# obtain Train data and Train output X = data.iloc[:,:-1] print("\nThe First 5 values of train data is\n",X.head())
The First 5 values of train data is
Outlook Temperature Humidity Windy
0 Sunny Hot High False
1 Sunny Hot High True
2 Overcast Hot High False
3 Rainy Mild High False
4 Rainy Cool Normal False
y = data.iloc[:,-1] print("\nThe first 5 values of Train output is\n",y.head())
The first 5 values of Train output is
0 No
1 No
2 Yes
3 Yes
4 Yes
Name: PlayTennis, dtype: object
# Convert then in numbers le_outlook = LabelEncoder() X.Outlook = le_outlook.fit_transform(X.Outlook) le_Temperature = LabelEncoder() X.Temperature = le_Temperature.fit_transform(X.Temperature) le_Humidity = LabelEncoder() X.Humidity = le_Humidity.fit_transform(X.Humidity) le_Windy = LabelEncoder() X.Windy = le_Windy.fit_transform(X.Windy) print("\nNow the Train data is :\n",X.head())
Now the Train data is :
Outlook Temperature Humidity Windy
0 2 1 0 0
1 2 1 0 1
2 0 1 0 0
3 1 2 0 0
4 1 0 1 0
le_PlayTennis = LabelEncoder() y = le_PlayTennis.fit_transform(y) print("\nNow the Train output is\n",y)
Now the Train output is
[0 0 1 1 1 0 1 0 1 1 1 1 1 0]
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20) classifier = GaussianNB() classifier.fit(X_train,y_train) from sklearn.metrics import accuracy_score print("Accuracy is:",accuracy_score(classifier.predict(X_test),y_test))
Accuracy is: 0.6666666666666666
Download Dataset