python - Shuffling Increasing Accuracy - sklearn - MultinomialNaiveBayes -


I am trying to calculate the accuracy of a multipurpose NaiveBayes algorithm in scikit-learning.

Here is the code:

  NP import random import random collection sklearn import naive_bayes import as LabelBinarizer import sklearn.preprocessing numpy import counter dim0 = [ 'High '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' ['Big', 'Made', 'Small'] dim5 = ['High', 'Low', 'Made'] Target = ['Mad', 'Vihu'] '4', 'More'] [4] ACC ',' OK ',' unacc ',' vgood '] Dimension = [dim0, dim1, dim2, dim3, dim4, dim5, target] # Function to read the readDataSet (fname) dataset: f = open (fname, 'R') Dataset = [] F in line: [=] [tokenized = line.strip (). Partition (',') if LAN (tokenized)! = 7: tokenized in the release w: words.append (w) dataset.append (np .array (word)) np.array returns (D features and Y - ataset) # Dataset split label in the target / targets # The last column of data) XYfromDataset (Dataset def target is: X = [] Y = [] D for dataset: X.append (NP. Night (D [: - 1])) Y.append (d [-1]) Returns NP. (X), NP. Arre (Y) def split XY (x, y, PRC): splitpos = int (len (x) * PERC) X_train = x [: splitpos] X_test = x [splitpos:] Y_train = Y [: splitpos] Y_test = Y [splitpos:] return (X_train, Y_train, X_test, Y_test) def mapDimension (Dimen, mapping): Res = [] to d in dimen: res.append (float (mapping.index (d))) np.array ( res) def runTrails (dataset, split = 0.66): random.shuffle (datasets, random. random) (x, y) = XYfromDataset (dataset) (X_train, Y_train, X_test, Y_test) = splitXY (X, Y, split) MNB = naive_bayes.MultinomialNB () mnb.fit (X_train, Y_train) Score = mnb.score (X_test, Y_test) mnb = Any Return Score Date Dataset [:, 6]) Dataset [], D] = map dimension (dataset [], for Dataset ['Dataset [1] :, D], dimension [D] dataset = dataset. Sleep (float) score = 0.0 num_trails = 10 in range (num_trails): acc = runTrails (dataset) print "trail", t, "accuracy:", acc score + = ac print score / num_trails  
Trail Score: 100% Accuracy: Very Low Buy Now: Find similar items on eBay Search: Issued on Search: Series + Issued on Try ALL options Easily manage your personal collection. Or, if I remove the random.shuffle () methods runTrail ()

  in class distribution, then the output of Bosnia and Herzegovina  

the counter ({ 'unacc': 1210, 'ACC': 384, 'good': 69, 'vgood': 65}) Trail 0 accuracy: 0.583333 333333 Trail 1 Accuracy: 0.583333333333 Trail Accuracy: low Description: 0.583333333333 Trail 4 Accuracy: 0.583333333333 Trail 5 Accuracy: 0.583333333333 Trail 6 Accuracy: 0k583333333333 Trail 7 Accuracy: 0.583333333333 Trail 8 Accuracy: 0.583333333333 Trail 9 Accuracy: 0.583333333333 0.583333333333>

I think Algo in this dataset To affect the accuracy of Itham cases may affect the penalty - as the dataset are given respectively by the class.

Therefore, the accuracy on the first iteration is approximately 70.

But why does the accuracy continue to grow? It makes no sense to me. If the algorithm is being trained, it will perform better, but here I am reshaping a new instance and dataset too.


Comments