I am trying to calculate the accuracy of a multipurpose NaiveBayes algorithm in scikit-learning.
Here is the code:
NP import random import random collection sklearn import naive_bayes import as LabelBinarizer import sklearn.preprocessing numpy import counter dim0 = [ 'High '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' ['Big', 'Made', 'Small'] dim5 = ['High', 'Low', 'Made'] Target = ['Mad', 'Vihu'] '4', 'More'] [4] ACC ',' OK ',' unacc ',' vgood '] Dimension = [dim0, dim1, dim2, dim3, dim4, dim5, target] # Function to read the readDataSet (fname) dataset: f = open (fname, 'R') Dataset = [] F in line: [=] [tokenized = line.strip (). Partition (',') if LAN (tokenized)! = 7: tokenized in the release w: words.append (w) dataset.append (np .array (word)) np.array returns (D features and Y - ataset) # Dataset split label in the target / targets # The last column of data) XYfromDataset (Dataset def target is: X = [] Y = [] D for dataset: X.append (NP. Night (D [: - 1])) Y.append (d [-1]) Returns NP. (X), NP. Arre (Y) def split XY (x, y, PRC): splitpos = int (len (x) * PERC) X_train = x [: splitpos] X_test = x [splitpos:] Y_train = Y [: splitpos] Y_test = Y [splitpos:] return (X_train, Y_train, X_test, Y_test) def mapDimension (Dimen, mapping): Res = [] to d in dimen: res.append (float (mapping.index (d))) np.array ( res) def runTrails (dataset, split = 0.66): random.shuffle (datasets, random. random) (x, y) = XYfromDataset (dataset) (X_train, Y_train, X_test, Y_test) = splitXY (X, Y, split) MNB = naive_bayes.MultinomialNB () mnb.fit (X_train, Y_train) Score = mnb.score (X_test, Y_test) mnb = Any Return Score Date Dataset [:, 6]) Dataset [], D] = map dimension (dataset [], for Dataset ['Dataset [1] :, D], dimension [D] dataset = dataset. Sleep (float) score = 0.0 num_trails = 10 in range (num_trails): acc = runTrails (dataset) print "trail", t, "accuracy:", acc score + = ac print score / num_trails
in class distribution, then the output of Bosnia and Herzegovina
the counter ({ 'unacc': 1210, 'ACC': 384, 'good': 69, 'vgood': 65}) Trail 0 accuracy: 0.583333 333333 Trail 1 Accuracy: 0.583333333333 Trail Accuracy: low Description: 0.583333333333 Trail 4 Accuracy: 0.583333333333 Trail 5 Accuracy: 0.583333333333 Trail 6 Accuracy: 0k583333333333 Trail 7 Accuracy: 0.583333333333 Trail 8 Accuracy: 0.583333333333 Trail 9 Accuracy: 0.583333333333 0.583333333333>
I think Algo in this dataset To affect the accuracy of Itham cases may affect the penalty - as the dataset are given respectively by the class.
Therefore, the accuracy on the first iteration is approximately 70.
But why does the accuracy continue to grow? It makes no sense to me. If the algorithm is being trained, it will perform better, but here I am reshaping a new instance and dataset too.
Comments
Post a Comment