6. AI AND MACHINE LEARNING VTU LAB | READ NOW

MACHINE LEARNING VTU LAB – Naive Bayesian Classifier(using API)

Program 6. ASSUMING A SET OF DOCUMENTS THAT NEED TO BE CLASSIFIED, USE THE NAÏVE BAYESIAN CLASSIFIER MODEL TO PERFORM THIS TASK. BUILT-IN JAVA CLASSES/API CAN BE USED TO WRITE THE PROGRAM. CALCULATE THE ACCURACY, PRECISION, AND RECALL FOR YOUR DATA SET.


Program Code – lab6.py

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
for doc, p in zip(Xtrain, pred):
p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))
import pandas as pd msg = pd.read_csv('document.csv', names=['message', 'label']) print("Total Instances of Dataset: ", msg.shape[0]) msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0}) X = msg.message y = msg.labelnum from sklearn.model_selection import train_test_split Xtrain, Xtest, ytrain, ytest = train_test_split(X, y) from sklearn.feature_extraction.text import CountVectorizer count_v = CountVectorizer() Xtrain_dm = count_v.fit_transform(Xtrain) Xtest_dm = count_v.transform(Xtest) df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names()) print(df[0:5]) from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB() clf.fit(Xtrain_dm, ytrain) pred = clf.predict(Xtest_dm) for doc, p in zip(Xtrain, pred): p = 'pos' if p == 1 else 'neg' print("%s -> %s" % (doc, p)) from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score print('Accuracy Metrics: \n') print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred)) print('Precision: ', precision_score(ytest, pred)) print('Confusion Matrix: \n', confusion_matrix(ytest, pred))
import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer

count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)

for doc, p in zip(Xtrain, pred):
    p = 'pos' if p == 1 else 'neg'
    print("%s -> %s" % (doc, p))

from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

MACHINE LEARNING Program Execution – lab6.ipynb

Jupyter Notebook program execution.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
import pandas as pd msg = pd.read_csv('document.csv', names=['message', 'label']) print("Total Instances of Dataset: ", msg.shape[0]) msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

Total Instances of Dataset: 18

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
X = msg.message y = msg.labelnum from sklearn.model_selection import train_test_split Xtrain, Xtest, ytrain, ytest = train_test_split(X, y) from sklearn.feature_extraction.text import CountVectorizer count_v = CountVectorizer() Xtrain_dm = count_v.fit_transform(Xtrain) Xtest_dm = count_v.transform(Xtest)
X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer

count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names()) print(df[0:5])
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])

about am an and awesome bad beers best boss can … tired to \
0 0 1 0 1 0 0 0 0 0 0 … 1 0
1 0 0 0 0 0 0 0 0 0 0 … 0 0
2 0 0 0 0 0 0 0 0 0 0 … 0 0
3 0 0 0 0 0 0 0 0 0 1 … 0 0
4 0 0 0 0 0 0 0 0 0 0 … 0 0

today tomorrow very we went will with work
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 0
4 0 0 0 0 0 0 0 0

[5 rows x 49 columns]

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB() clf.fit(Xtrain_dm, ytrain) pred = clf.predict(Xtest_dm)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
for doc, p in zip(Xtrain, pred):
p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))
for doc, p in zip(Xtrain, pred): p = 'pos' if p == 1 else 'neg' print("%s -> %s" % (doc, p))
for doc, p in zip(Xtrain, pred):
    p = 'pos' if p == 1 else 'neg'
    print("%s -> %s" % (doc, p))

I am sick and tired of this place -> pos
I do not like the taste of this juice -> neg
I love this sandwich -> neg
I can’t deal with this -> pos
I do not like this restaurant -> neg

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score print('Accuracy Metrics: \n') print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred)) print('Precision: ', precision_score(ytest, pred)) print('Confusion Matrix: \n', confusion_matrix(ytest, pred))
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

Accuracy Metrics:

Accuracy: 0.6
Recall: 0.5
Precision: 1.0
Confusion Matrix:
[[1 0]
[2 2]]

Alternative- alt lab6.ipynb

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from sklearn.datasets import fetch_20newsgroups
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import numpy as np
from sklearn.datasets import fetch_20newsgroups from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import numpy as np
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med']
twenty_train = fetch_20newsgroups(subset='train',categories=categories,shuffle=True)
twenty_test = fetch_20newsgroups(subset='test',categories=categories,shuffle=True)
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med'] twenty_train = fetch_20newsgroups(subset='train',categories=categories,shuffle=True) twenty_test = fetch_20newsgroups(subset='test',categories=categories,shuffle=True)
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med']
twenty_train = fetch_20newsgroups(subset='train',categories=categories,shuffle=True)
twenty_test = fetch_20newsgroups(subset='test',categories=categories,shuffle=True)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
print(len(twenty_train.data))
print(len(twenty_test.data))
print(twenty_train.target_names)
print("\n".join(twenty_train.data[0].split("\n")))
print(twenty_train.target[0])
print(len(twenty_train.data)) print(len(twenty_test.data)) print(twenty_train.target_names) print("\n".join(twenty_train.data[0].split("\n"))) print(twenty_train.target[0])
print(len(twenty_train.data))
print(len(twenty_test.data))
print(twenty_train.target_names)
print("\n".join(twenty_train.data[0].split("\n")))
print(twenty_train.target[0])

2257
1502
[‘alt.atheism’, ‘comp.graphics’, ‘sci.med’, ‘soc.religion.christian’]
From: sd345@city.ac.uk (Michael Collier)
Subject: Converting images to HP LaserJet III?
Nntp-Posting-Host: hampton
Organization: The City University
Lines: 14

Does anyone know of a good way (standard PC application/PD utility) to
convert tif/img/tga files into LaserJet III format. We would also like to
do the same, converting to HPGL (HP plotter) files.

Please email any response.

Is this the correct group?

Thanks in advance. Michael.

Michael Collier (Programmer) The Computer Unit,
Email: M.P.Collier@uk.ac.city The City University,
Tel: 071 477-8000 x3769 London,
Fax: 071 477-8565 EC1V 0HB.

See also  8. C PROGRAMMING LAB | Check Now

1

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_tf = count_vect.fit_transform(twenty_train.data)
from sklearn.feature_extraction.text import CountVectorizer count_vect = CountVectorizer() X_train_tf = count_vect.fit_transform(twenty_train.data)
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_tf = count_vect.fit_transform(twenty_train.data)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_tf)
X_train_tfidf.shape
from sklearn.feature_extraction.text import TfidfTransformer tfidf_transformer = TfidfTransformer() X_train_tfidf = tfidf_transformer.fit_transform(X_train_tf) X_train_tfidf.shape
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_tf)
X_train_tfidf.shape

(2257, 35788)

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn import metrics
mod = MultinomialNB()
mod.fit(X_train_tfidf, twenty_train.target)
X_test_tf = count_vect.transform(twenty_test.data)
X_test_tfidf = tfidf_transformer.transform(X_test_tf)
predicted = mod.predict(X_test_tfidf)
from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score from sklearn import metrics mod = MultinomialNB() mod.fit(X_train_tfidf, twenty_train.target) X_test_tf = count_vect.transform(twenty_test.data) X_test_tfidf = tfidf_transformer.transform(X_test_tf) predicted = mod.predict(X_test_tfidf)
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn import metrics
mod = MultinomialNB()
mod.fit(X_train_tfidf, twenty_train.target)
X_test_tf = count_vect.transform(twenty_test.data)
X_test_tfidf = tfidf_transformer.transform(X_test_tf)
predicted = mod.predict(X_test_tfidf)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
print("Accuracy:", accuracy_score(twenty_test.target, predicted))
print(classification_report(twenty_test.target,predicted,target_names=twenty_test.target_names))
print("confusion matrix is \n",metrics.confusion_matrix(twenty_test.target, predicted))
print("Accuracy:", accuracy_score(twenty_test.target, predicted)) print(classification_report(twenty_test.target,predicted,target_names=twenty_test.target_names)) print("confusion matrix is \n",metrics.confusion_matrix(twenty_test.target, predicted))
print("Accuracy:", accuracy_score(twenty_test.target, predicted))
print(classification_report(twenty_test.target,predicted,target_names=twenty_test.target_names))
print("confusion matrix is \n",metrics.confusion_matrix(twenty_test.target, predicted))

Accuracy: 0.8348868175765646
precision recall f1-score support

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
precision recall f1-score support
alt.atheism 0.97 0.60 0.74 319
comp.graphics 0.96 0.89 0.92 389
sci.med 0.97 0.81 0.88 396
soc.religion.christian 0.65 0.99 0.78 398
avg / total 0.88 0.83 0.84 1502
precision recall f1-score support alt.atheism 0.97 0.60 0.74 319 comp.graphics 0.96 0.89 0.92 389 sci.med 0.97 0.81 0.88 396 soc.religion.christian 0.65 0.99 0.78 398 avg / total 0.88 0.83 0.84 1502
                    precision    recall  f1-score   support

       alt.atheism       0.97      0.60      0.74       319
     comp.graphics       0.96      0.89      0.92       389
           sci.med       0.97      0.81      0.88       396
soc.religion.christian       0.65      0.99      0.78       398
       avg / total       0.88      0.83      0.84      1502

confusion matrix is
[[192 2 6 119]
[ 2 347 4 36]
[ 2 11 322 61]
[ 2 2 1 393]]

Download Dataset

Leave a Reply

Your email address will not be published. Required fields are marked *

WhatsApp Icon Join For Job Alerts