
+ Search 

Feb 25th, 2003 06:18
Gregoire Dooms, Nathan Wallace, Hans Nowak, Snippet 228, Magnus L. Hetland
""" Packages: miscellaneous """ """ Explanation of the Levenshtein distance algorithm... (Got a request for this, and thought I might as well post it...) The algorithm: """ def distance(a,b): c = {} n = len(a); m = len(b) for i in range(0,n+1): c[i,0] = i for j in range(0,m+1): c[0,j] = j for i in range(1,n+1): for j in range(1,m+1): x = c[i1,j]+1 y = c[i,j1]+1 if a[i1] == b[j1]: z = c[i1,j1] else: z = c[i1,j1]+1 c[i,j] = min(x,y,z) return c[n,m] """ It calculates the following: Given two strings, a and b, and three operations, adding, subtracting and exchanging single characters, what is the minimal number of steps needed to translate a into b? The method is based on the following idea: We want to find the distance between a[:x] and b[:y]. To do this, we first calculate 1) the distance between a[:x1] and b[:y], adding the cost of a subtractoperation, used to get from a[:x] to a[:z1]; 2) the distance between a[:x] and b[:y1], adding the cost of an additionoperation, used to get from b[:y1] to b[:y]; 3) the distance between a[:x1] and b[:y1], adding the cost of a *possible* exchange of the letter b[y] (with a[x]). The cost of the subtraction and addition operations are 1, while the exchange operation has a cost of 1 if a[x] and b[y] are different, and 0 otherwise. After calculating these costs, we choose the least one of them (since we want to use the best solution.) Instead of doing this recursively (i.e. calculating ourselves "back" from the final value), we build a costmatrix c containing the optimal costs, so we can reuse them when calculating the later values. The costs c[i,0] (from string of length n to empty string) are all i, and correspondingly all c[0,j] (from empty string to string of length j) are j. Finally, the cost of translating between the full strings a and b (c[n,m]) is returned. I guess that ought to cover it... """