I have a chain type object where I have to apply a function to correct it The one that uses bigrams to the right I made a barmaraam list, sorted it according to frequency (first comes first first) and it is called fdist.
bigrams = [zip (l.split ("") b For b in text2 [b:], l.split ("") [1:])] freq = Nltk.FreqDist (bigrams) # freq of freq # freq.keys () freq
< Sorted according to / pre>, I have created a function, which uses each row ("or sentence", "the purpose of a list"), and uses the gram to fix it Is that su Do not have or is Def bigram_corr (line): Input line (sentence) words = line.split () #split for words word 1, word 2 in zip (word [: - 1], word [1:]) : # After creating 2 words in a word 1,2 at 2,2,4,4 and after that, fdist in J: # greater override if (word2 == j) and (jf.levenshtein_distance Word1, i) & lt; 3): # Both are 2 words of the match, and 1 word edit the distance of 2 or 1, replacing the word with the most commonly used bermam word The problem is that only one word is returned for the entire sentence, for example:
"LTS pre-tested East "lets it appear that iterations are not working.
For Word 1 loop, Word 2 works like this: After taking 1: "LTS Go", which will eventually be replaced by "delay" because "go" P>"second In the direction ".
"In the side" 3. In the waitress and so on.
It looks like you are doing
word1 = i
hopefully it will modify the contents of thewords
but it will not. If you want to modify theword
, then you have to do this. Usecalculate
to track the index ofword1
. As 2rs2ts said, 2rs2ts said that you are returning quickly if you want once you find a good replacement, instead of returning tobreak
Want to end the internal loop and then return to the end of the function. Def bigram_corr (line): # Input line (sentence) words = line.split () with the #split line in the words for idx, calculate in (word1, word 2) (zip (Word [: - 1], word [1:])): i, j fdist: if #Iterate on bigrams (word2 == j) and (jf.levenshtein_distance (word1, i) and lieutenant; 3): # In both the words of both the matches, and 1 word is at a distance of 2 or 1, the word Baradam, which is the largest, replaces the word [idx] = I returning "". "(Word)
Comments
Post a Comment