return value - Replace Words on the basis of Bigram Frequency,Python -


I have a chain type object where I have to apply a function to correct it The one that uses bigrams to the right I made a barmaraam list, sorted it according to frequency (first comes first first) and it is called fdist.

  bigrams = [zip (l.split ("") b For b in text2 [b:], l.split ("") [1:])] freq = Nltk.FreqDist (bigrams) # freq of freq # freq.keys () freq  < Sorted according to / pre> 

, I have created a function, which uses each row ("or sentence", "the purpose of a list"), and uses the gram to fix it Is that su Do not have or is Def bigram_corr (line): Input line (sentence) words = line.split () #split for words word 1, word 2 in zip (word [: - 1], word [1:]) : # After creating 2 words in a word 1,2 at 2,2,4,4 and after that, fdist in J: # greater override if (word2 == j) and (jf.levenshtein_distance Word1, i) & lt; 3): # Both are 2 words of the match, and 1 word edit the distance of 2 or 1, replacing the word with the most commonly used bermam word The problem is that only one word is returned for the entire sentence, for example:
"LTS pre-tested East "lets it appear that iterations are not working.
For Word 1 loop, Word 2 works like this: After taking 1: "LTS Go", which will eventually be replaced by "delay" because "go" P>

"second In the direction ".

"In the side" 3. In the waitress and so on.

It looks like you are doing word1 = i hopefully it will modify the contents of the words but it will not. If you want to modify the word , then you have to do this. Use calculate to track the index of word1 . As 2rs2ts said, 2rs2ts said that you are returning quickly if you want once you find a good replacement, instead of returning to break Want to end the internal loop and then return to the end of the function. Def bigram_corr (line): # Input line (sentence) words = line.split () with the #split line in the words for idx, calculate in (word1, word 2) (zip (Word [: - 1], word [1:])): i, j fdist: if #Iterate on bigrams (word2 == j) and (jf.levenshtein_distance (word1, i) and lieutenant; 3): # In both the words of both the matches, and 1 word is at a distance of 2 or 1, the word Baradam, which is the largest, replaces the word [idx] = I returning "". "(Word)


Comments