I have a chain type object where I have to apply a function to correct it The one that uses bigrams to the right I made a barmaraam list, sorted it according to frequency (first comes first first) and it is called fdist.
bigrams = [zip (l.split ("") b For b in text2 [b:], l.split ("") [1:])] freq = Nltk.FreqDist (bigrams) # freq of freq # freq.keys () freq< Sorted according to / pre>, I have created a function, which uses each row ("or sentence", "the purpose of a list"), and uses the gram to fix it Is that su Do not have or is Def bigram_corr (line): Input line (sentence) words = line.split () #split for words word 1, word 2 in zip (word [: - 1], word [1:]) : # After creating 2 words in a word 1,2 at 2,2,4,4 and after that, fdist in J: # greater override if (word2 == j) and (jf.levenshtein_distance Word1, i) & lt; 3): # Both are 2 words of the match, and 1 word edit the distance of 2 or 1, replacing the word with the most commonly used bermam word The problem is that only one word is returned for the entire sentence, for example:
"LTS pre-tested East "lets it appear that iterations are not working.
For Word 1 loop, Word 2 works like this: After taking 1: "LTS Go", which will eventually be replaced by "delay" because "go" P>"second In the direction ".
"In the side" 3. In the waitress and so on.
It looks like you are doing
word1 = ihopefully it will modify the contents of thewordsbut it will not. If you want to modify theword, then you have to do this. Usecalculateto track the index ofword1. As 2rs2ts said, 2rs2ts said that you are returning quickly if you want once you find a good replacement, instead of returning tobreakWant to end the internal loop and then return to the end of the function. Def bigram_corr (line): # Input line (sentence) words = line.split () with the #split line in the words for idx, calculate in (word1, word 2) (zip (Word [: - 1], word [1:])): i, j fdist: if #Iterate on bigrams (word2 == j) and (jf.levenshtein_distance (word1, i) and lieutenant; 3): # In both the words of both the matches, and 1 word is at a distance of 2 or 1, the word Baradam, which is the largest, replaces the word [idx] = I returning "". "(Word)
Comments
Post a Comment