Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have excel sheet with data which I want to get Levenshtein Distance between them. I already tried to export as text, read in from script (php), run Levenshtein (calculate Levenshtein Distance), save it to excel again.
But I am looking for a way to programatically calculate a Levenshtein Distance in VBA. How would I go about doing so?
Translated from Wikipedia :
Option Explicit
Public Function Levenshtein(s1 As String, s2 As String)
Dim i As Integer
Dim j As Integer
Dim l1 As Integer
Dim l2 As Integer
Dim d() As Integer
Dim min1 As Integer
Dim min2 As Integer
l1 = Len(s1)
l2 = Len(s2)
ReDim d(l1, l2)
For i = 0 To l1
d(i, 0) = i
Next
For j = 0 To l2
d(0, j) = j
Next
For i = 1 To l1
For j = 1 To l2
If Mid(s1, i, 1) = Mid(s2, j, 1) Then
d(i, j) = d(i - 1, j - 1)
Else
min1 = d(i - 1, j) + 1
min2 = d(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = d(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
d(i, j) = min1
End If
Next
Next
Levenshtein = d(l1, l2)
End Function
?Levenshtein("saturday","sunday")
3
Thanks to smirkingman for the nice code post. Here is an optimized version.
1) Use Asc(Mid$(s1, i, 1) instead. Numerical comparision is generally faster than text.
2) Use Mid$ istead of Mid since the later is the variant ver. and adding $ is string ver.
3) Use application function for min. (personal preference only)
4) Use Long instead of Integers since it's what excel natively uses.
Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long
string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)
For i = 0 To string1_length
distance(i, 0) = i
Next
For j = 0 To string2_length
distance(0, j) = j
Next
For i = 1 To string1_length
For j = 1 To string2_length
If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
distance(i, j) = distance(i - 1, j - 1)
Else
distance(i, j) = Application.WorksheetFunction.Min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
End If
Next
Next
Levenshtein = distance(string1_length, string2_length)
End Function
UPDATE:
For those who want it: I think it's safe to say that most people use Levenshtein distance to calculate fuzzy match percentages. Here's a way to do that, and I have added an optimization that you can specify the min. match % to return (default is 70%+. You enter percentags like "50" or "80", or "0" to run the formula regardless).
The speed boost comes from the fact that the function will check if it's even possible that it's within the percentage you give it by checking the length of the 2 strings. Please note there are some areas where this function can be optimized, but I have kept it at this for the sake of readability. I concatenated the distance in result for proof of functionality, but you can fiddle with it :)
Function FuzzyMatch(ByVal string1 As String, _
ByVal string2 As String, _
Optional min_percentage As Long = 70) As String
Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long, result As Long
string1_length = Len(string1)
string2_length = Len(string2)
' Check if not too long
If string1_length >= string2_length * (min_percentage / 100) Then
' Check if not too short
If string1_length <= string2_length * ((200 - min_percentage) / 100) Then
ReDim distance(string1_length, string2_length)
For i = 0 To string1_length: distance(i, 0) = i: Next
For j = 0 To string2_length: distance(0, j) = j: Next
For i = 1 To string1_length
For j = 1 To string2_length
If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
distance(i, j) = distance(i - 1, j - 1)
Else
distance(i, j) = Application.WorksheetFunction.Min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
End If
Next
Next
result = distance(string1_length, string2_length) 'The distance
End If
End If
If result <> 0 Then
FuzzyMatch = (CLng((100 - ((result / string1_length) * 100)))) & _
"% (" & result & ")" 'Convert to percentage
Else
FuzzyMatch = "Not a match"
End If
End Function
Use a byte array for 17x speed gain
Option Explicit
Public Declare Function GetTickCount Lib "kernel32" () As Long
Sub test()
Dim s1 As String, s2 As String, lTime As Long, i As Long
s1 = Space(100)
s2 = String(100, "a")
lTime = GetTickCount
For i = 1 To 100
LevenshteinStrings s1, s2 ' the original fn from Wikibooks and Stackoverflow
Next
Debug.Print GetTickCount - lTime; " ms" ' 3900 ms for all diff
lTime = GetTickCount
For i = 1 To 100
Levenshtein s1, s2
Next
Debug.Print GetTickCount - lTime; " ms" ' 234 ms
End Sub
'Option Base 0 assumed
'POB: fn with byte array is 17 times faster
Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long, bs1() As Byte, bs2() As Byte
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long
Dim min1 As Long, min2 As Long, min3 As Long
string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)
bs1 = string1
bs2 = string2
For i = 0 To string1_length
distance(i, 0) = i
Next
For j = 0 To string2_length
distance(0, j) = j
Next
For i = 1 To string1_length
For j = 1 To string2_length
'slow way: If Mid$(string1, i, 1) = Mid$(string2, j, 1) Then
If bs1((i - 1) * 2) = bs2((j - 1) * 2) Then ' *2 because Unicode every 2nd byte is 0
distance(i, j) = distance(i - 1, j - 1)
Else
'distance(i, j) = Application.WorksheetFunction.Min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
' spell it out, 50 times faster than worksheetfunction.min
min1 = distance(i - 1, j) + 1
min2 = distance(i, j - 1) + 1
min3 = distance(i - 1, j - 1) + 1
If min1 <= min2 And min1 <= min3 Then
distance(i, j) = min1
ElseIf min2 <= min1 And min2 <= min3 Then
distance(i, j) = min2
Else
distance(i, j) = min3
End If
End If
Next
Next
Levenshtein = distance(string1_length, string2_length)
End Function
I think it got even faster... Didn't do much other than improve previous code for speed and results as %
' Levenshtein3 tweaked for UTLIMATE speed and CORRECT results
' Solution based on Longs
' Intermediate arrays holding Asc()make difference
' even Fixed length Arrays have impact on speed (small indeed)
' Levenshtein version 3 will return correct percentage
'
Function Levenshtein3(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long, string1_length As Long, string2_length As Long
Dim distance(0 To 60, 0 To 50) As Long, smStr1(1 To 60) As Long, smStr2(1 To 50) As Long
Dim min1 As Long, min2 As Long, min3 As Long, minmin As Long, MaxL As Long
string1_length = Len(string1): string2_length = Len(string2)
distance(0, 0) = 0
For i = 1 To string1_length: distance(i, 0) = i: smStr1(i) = Asc(LCase(Mid$(string1, i, 1))): Next
For j = 1 To string2_length: distance(0, j) = j: smStr2(j) = Asc(LCase(Mid$(string2, j, 1))): Next
For i = 1 To string1_length
For j = 1 To string2_length
If smStr1(i) = smStr2(j) Then
distance(i, j) = distance(i - 1, j - 1)
Else
min1 = distance(i - 1, j) + 1
min2 = distance(i, j - 1) + 1
min3 = distance(i - 1, j - 1) + 1
If min2 < min1 Then
If min2 < min3 Then minmin = min2 Else minmin = min3
Else
If min1 < min3 Then minmin = min1 Else minmin = min3
End If
distance(i, j) = minmin
End If
Next
Next
' Levenshtein3 will properly return a percent match (100%=exact) based on similarities and Lengths etc...
MaxL = string1_length: If string2_length > MaxL Then MaxL = string2_length
Levenshtein3 = 100 - CLng((distance(string1_length, string2_length) * 100) / MaxL)
End Function
Related
I have a function which gives Levenshtein Distance used for fuzzy search of a field.
Option Compare Database
' Levenshtein3 tweaked for UTLIMATE speed and CORRECT results
' Solution based on Longs
' Intermediate arrays holding Asc()make difference
' even Fixed length Arrays have impact on speed (small indeed)
' Levenshtein version 3 will return correct percentage
'
Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long
1
Dim i As Long, j As Long, string1_length As Long, string2_length As Long
Dim distance(0 To 60, 0 To 50) As Long, smStr1(1 To 60) As Long, smStr2(1 To 50) As Long
Dim min1 As Long, min2 As Long, min3 As Long, minmin As Long, MaxL As Long
string1_length = Len(string1): string2_length = Len(string2)
distance(0, 0) = 0
For i = 1 To string1_length: distance(i, 0) = i: smStr1(i) = Asc(LCase(Mid$(string1, i, 1))): Next
For j = 1 To string2_length: distance(0, j) = j: smStr2(j) = Asc(LCase(Mid$(string2, j, 1))): Next
For i = 1 To string1_length
For j = 1 To string2_length
If smStr1(i) = smStr2(j) Then
distance(i, j) = distance(i - 1, j - 1)
Else
min1 = distance(i - 1, j) + 1
min2 = distance(i, j - 1) + 1
min3 = distance(i - 1, j - 1) + 1
If min2 < min1 Then
If min2 < min3 Then minmin = min2 Else minmin = min3
Else
If min1 < min3 Then minmin = min1 Else minmin = min3
End If
distance(i, j) = minmin
End If
Next
Next
' Levenshtein3 will properly return a percent match (100%=exact) based on similarities and Lengths etc...
MaxL = string1_length: If string2_length > MaxL Then MaxL = string2_length
Levenshtein = 100 - CLng((distance(string1_length, string2_length) * 100) / MaxL)
End Function
When I try to write a SQL statement like the one below, I get a datatype mismatch error:
strSQL_1 = "SELECT * FROM Location_Table ORDER BY (Levenshtein([Short_Desc],'Search String')) DESC;"
Set rs = CurrentDb.OpenRecordset(strSQL_1)
Where Short_Desc is a column in the Location_Table with Short Text datatype.
Using Debug.Print strSQL_1, this is what's shown in the immediate window:
SELECT * FROM Location_Table ORDER BY Levenshtein(Short_Desc,'Search String') DESC;
So I have a button to randomly pick a number from a range of 1 to 100 and place in a cell:
Public Sub RangeValue()
Dim i As Long
Static n As Long, s As String
Const MIN = 1, MAX = 100, OUT = "L10", DEL = "."
Randomize
Do
i = Rnd * (MAX - MIN) + MIN
If 0 = InStr(s, i & DEL) Then
n = n + 1: s = s & i & DEL
Range(OUT) = i
If n > MAX - MIN Then n = 0: s = ""
Exit Do
End If: DoEvents
Loop
End Sub
I want it to now pick 2 number from that range and place into two cells. I have tried:
Public Sub RangeValue()
Dim i As Long
Static n As Long, s As String
Const MIN = 1, MAX = 100, OUT = "L10""L11", DEL = "."
Randomize
Do
i = Rnd * (MAX - MIN) + MIN
If 0 = InStr(s, i & DEL) Then
n = n + 1: s = s & i & DEL
Range(OUT) = i
If n > MAX - MIN Then n = 0: s = ""
Exit Do
End If: DoEvents
Loop
End Sub
But it does not work. The goal is to have 2 numbers picked and put into 2 cells, and have them not repeat till all 100 numbers have been used.
I have not looked at your code closely but for the same you can also try code like below which is simpler to maintain and shall give similar results.
Sub RandomValues()
With Range("L10:L11")
.Formula = "=RANDBETWEEN(1,100)"
.Value = .Value
End With
End Sub
How about:
Public arr(1 To 100) As Variant
Sub ButtonCode()
If arr(1) = "" Then
For I = 1 To 100
arr(I) = I
Next I
Call Shuffle(arr)
Range("L10").Value = arr(1)
Else
N = Application.WorksheetFunction.Count(Range("L10:L" & Rows.Count))
Range("L10").Offset(N) = arr(N + 1)
End If
End Sub
Public Sub Shuffle(InOut() As Variant)
Dim I As Long, J As Long
Dim tempF As Double, Temp As Variant
Hi = UBound(InOut)
Low = LBound(InOut)
ReDim Helper(Low To Hi) As Double
Randomize
For I = Low To Hi
Helper(I) = Rnd
Next I
J = (Hi - Low + 1) \ 2
Do While J > 0
For I = Low To Hi - J
If Helper(I) > Helper(I + J) Then
tempF = Helper(I)
Helper(I) = Helper(I + J)
Helper(I + J) = tempF
Temp = InOut(I)
InOut(I) = InOut(I + J)
InOut(I + J) = Temp
End If
Next I
For I = Hi - J To Low Step -1
If Helper(I) > Helper(I + J) Then
tempF = Helper(I)
Helper(I) = Helper(I + J)
Helper(I + J) = tempF
Temp = InOut(I)
InOut(I) = InOut(I + J)
InOut(I + J) = Temp
End If
Next I
J = J \ 2
Loop
End Sub
This code will add values to the list starting with cell L10, without repeats.
EDIT#1:
This should be better:
Dim arr(1 To 100) As Variant
Dim clicks As Integer
Sub ButtonCode2()
If arr(1) = "" Then
For I = 1 To 100
arr(I) = I
Next I
Call Shuffle(arr)
Range("L10").Value = arr(1)
Range("L11").Value = arr(2)
clicks = 3
Else
Range("L10").Value = arr(clicks)
Range("L11").Value = arr(clicks + 1)
clicks = clicks + 2
End If
End Sub
(leave the Shuffle code as is)
There is a famous and the fastest permutation code without any "function" for VB .Net to permutate numbers just in several rows, that I can't remember unfortunately.
Is there anyone know this code? Or know like this?
Some part of the code is here:
UPDATE: I FOUND IT. ALL THE WORKING CODES HERE:
Dim L(4) As Byte
Dim I As Byte
Dim K As Byte
Dim J As Byte
Dim RESULTS As String
Dim UB, UBm1 As Integer
L = {1, 2, 3, 4, 5}
UB = L.GetUpperBound(0)
UBm1 = UB - 1
Do
I = UBm1
Do While I > 0 And L(I) >= L(I + 1)
I -= 1
Loop
K = L(I)
J = UB
Do While J > 0 And L(J) <= K
J -= 1
Loop
RESULTS = L(0) & "," & L(1) & "," & L(2) & "," & L(3) & "," & L(4)
L(I) = L(J)
L(J) = K
Array.Reverse(L, I + 1, UB - I)
Loop While J
It seems like you're looking for this ...
http://www.cut-the-knot.org/do_you_know/AllPerm.shtml
(2. Lexicographic order and finding the next permutation)
... ?
In case you are, the starting values for I and J are incorrect (they should be 4 and 5 instead of 3 and 4 respectively).
(I know the example uses swap, but that can be replaced with a single colon-delimited line.)
Dim L(4) As Byte
L = {1, 2, 3, 4, 5}
Dim K as Byte
For N as integer = 1 to 120 'No. of permutations: 5!
Dim I As Byte = 4, J as Byte = 5
While L(I - 1) >= L(I)
I -= 1
End While
While L(J - 1) <= L(I - 1)
J -= 1
End While
K = L(I - 1) : L(I - 1) = L(J - 1) : L(J - 1) = K
I += 1 : J = 5
While I < J
K = L(I - 1) : L(I - 1) = L(J - 1) : L(J - 1) = K
I += 1 : J -= 1
End While
Dim RESULT as String = L(0) & "," & L(1) & "," & L(2) & "," & L(3) & "," & L(4)
'Export / print RESULT as you like, e.g. Console.WriteLine or something
Next
For obtaining a set of permutations of a natural number (recommended less than 10 though it can be bigger than that), this VBA Excel sub-routine is very fast and short. I believe it can be easy to convert it to VB.NET. Have a look.
Const P = 5 'Recommended Max P = 9
Dim m As Long, PArray(1 To 1000, 1 To P) 'Recommended Max 9! = 362880
Public Sub PermutationNaturalNumber()
Dim Q(1 To P) As Long
For m = 1 To P: Q(m) = m: Next: m = 0
PermutationGenerator P, Q
Range("A1").Resize(UBound(PArray), P) = PArray: End
End Sub
Sub PermutationGenerator(n As Long, Q() As Long)
Dim i As Long, j As Long, k As Long
If n > 1 Then
For i = 1 To n - 1
PermutationGenerator n - 1, Q
If n Mod 2 = 1 Then j = 1 Else j = i
k = Q(j): Q(j) = Q(n): Q(n) = k
Next
PermutationGenerator n - 1, Q
Else
m = m + 1: For i = 1 To P: PArray(m, i) = Q(i): Next
End If
End Sub
P can be any natural number, in this case {1, 2, 3, 4, 5}. Make sure you change the upper bound of PArray with P! meaning (P*(P-1)*(P-2)*...*1).
I have been working on this code for a while. As you can see after the code line " With ws(2)" there is an if condition. Now, I have multiple to create multiple such If conditions such as for 0.6, 0.7, 0.8 etc. (and each such condition should use a different table of data) {I am posting the excel file link for the tables as well so that you can get an idea} Can I do this using a function or any method which wont require me to write this code again and again for each new condition ?
https://docs.google.com/file/d/0B1DVNSutDHR0QWd2UUJsVDZ1Tm8/edit
Private Sub CommandButton1_Click()
Dim x(1 To 9000) As Double, y(1 To 9000) As Double, x1 As Double, y1 As Double, x2 As Double, y2 As Double, I1(1 To 9000) As Double, I2(1 To 9000) As Double, R1(1 To 9000) As Double, R2(1 To 9000) As Double, a As Double, b As Double, c As Double, d As Double, Result(1 To 9000) As Double
Dim i As Integer, j As Integer, k As Integer, p As Integer, q As Integer, r As Integer, s As Integer, t As Integer
Dim ws As Sheets
Set ws = ActiveWorkbook.Sheets(Array("Sheet1", "PLP-1", "PLP-2"))
For t = 0 To 120 Step 20
For k = 1 To 9000
With ws(1)
I1(k) = .Cells(k + 2, 13).Value
I2(k) = .Cells(k + 2, 14).Value
End With
With ws(2)
Select Case .Cells(6 + t, 1).Value
Case 0.5:
r = 0
s = 0
Case 0.6:
r = 20
s = 1
Case 0.7:
r = 40
s = 2
Case 0.8:
r = 60
s = 2
Case 0.9:
r = 80
s = 3
Case 1:
r = 100
s = 4
Case 1.1:
r = 120
s = 5
End Select
For i = 7 To 22
If (.Cells(i + r, 1).Value <= I1(k)) And (I1(k) <= .Cells(i + r + 1, 1).Value) And Not (IsEmpty(I1(k))) Then
p = i + r
x(k) = I1(k)
x1 = .Cells(i + r, 1).Value
x2 = .Cells(i + r + 1, 1).Value
End If
Next i
For j = 2 To 8
If (.Cells(6 + r, j).Value <= I2(k)) And (I2(k) <= .Cells(6 + r, j + 1).Value) And Not (IsEmpty(I2(k))) Then
q = j + r
y(k) = I2(k)
y1 = .Cells(6 + r, j).Value
y2 = .Cells(6 + r, j + 1).Value
End If
Next j
If p <> 0 And q <> 0 Then
a = .Cells(p, q).Value
b = .Cells(p, q + 1).Value
c = .Cells(p + 1, q).Value
d = .Cells(p + 1, q + 1).Value
End If
If I1(k) = Empty Then
R1(k) = 0
Else
R1(k) = (((y2 - y(k)) / (y2 - y1)) * a) + (((y(k) - y1) / (y2 - y1)) * b)
End If
If I2(k) = Empty Then
R2(k) = 0
Else
R2(k) = (((y2 - y(k)) / (y2 - y1)) * c) + (((y(k) - y1) / (y2 - y1)) * d)
End If
Result(k) = (((x2 - x(k)) / (x2 - x1)) * R1(k)) + (((x(k) - x1) / (x2 - x1)) * R2(k))
End With
With ws(1)
.Cells(k + 2, 15 + s).Value = Result(k)
End With
Next k
Next t
End Sub
Try using a Select Case statement as below:
Dim iStart As Long, iEnd As long, jStart As Long, jEnd As Long
'...
With ws(2)
Select Case .Cells(6, 1).Value
Case 0.5:
iStart = 7: iEnd = 22
jStart = 2: jEnd = 7
Case 0.6:
'Same as above but substitute new values for iStart etc.
End Select
For i = iStart To iEnd
'DO STUFF WITH i
Next i
For j = jStart To jEnd
'DO STUFF WITH j
Next j
End With
EDIT: Updated to reflect needs clarified in comments
A more in-depth explanation and usage guide for Select Case can be found here
With regard to your looping, if I understand your code, you need to loop through each "table", but your I and J refer to absolute addresses. What you want is to have I and J be relative to the desired table.
I just used values of 2 to 7, but if the tables are different sizes, you could certainly determine that with code; or even read them into a variant array and do your testing on the array (would often be faster).
So something like the following (pseudo code)
Option Explicit
'N is the Value that defines the proper table
Function DoYourThingOnProperRange(N As Double)
Dim C As Range
Dim I As Long, J As Long
With Sheet1.Columns(1)
Set C = .Find(what:=N, after:=Sheet1.Cells(Rows.Count, "A"), LookIn:=xlValues, _
lookat:=xlWhole, searchorder:=xlByRows, searchdirection:=xlNext)
If Not C Is Nothing Then
Set C = C.CurrentRegion 'C is now set to the proper table
'DoYourThing
'Here's just a dummy routine
For I = 2 To 7
For J = 2 To 7
Debug.Print C(I, J).Address
Next J
Next I
Else
'some kind or error routine for non-existent table
End If
End With
End Function
I am trying to write a general code for matrix multiplication but when I am trying to verify it, the output is always a null matrix. So it seems that the values of the temp matrix are not getting updated.
Please suggest some changes for it to work. The code is copied below:
Public Function matrixmultiply(x() As Double, y() As Double) As Double()
Dim nrow1 As Integer, nrow2 As Integer, ncol1 As Integer, ncol2 As Integer, i As Integer, j As Integer, k As Integer, temp() As Double
nrow1 = UBound(x, 1) - LBound(x, 1) + 1
ncol1 = UBound(x, 2) - LBound(x, 2) + 1
nrow2 = UBound(y, 1) - LBound(y, 1) + 1
ncol2 = UBound(y, 2) - LBound(y, 2) + 1
ReDim matrixmultiply(1 To nrow1, 1 To ncol2)
ReDim temp(1 To nrow1, 1 To ncol2)
For i = 1 To nrow1
For j = 1 To ncol2
d = 2
For k = 1 To col1
temp(i, j) = temp(i, j) + x(i, k) * y(k, j)
Next k
Next j
Next i
matrixmultiply = temp
End Function
Private Sub CommandButton1_Click()
Dim x(1 To 3, 1 To 3) As Double, y(1 To 3, 1 To 3) As Double, z() As Double
Dim i As Integer, j As Integer
For i = 1 To 3
For j = 1 To 3
x(i, j) = Cells(i, j).Value
y(i, j) = Cells(i, j + 5).Value
Next j
Next i
z = matrixmultiply(x, y)
For i = 1 To 3
For j = 1 To 3
Cells(i, j + 12).Value = z(i, j)
Next j
Next i
End Sub
Silly mistake in the line:
For k = 1 To col1
It should, instead, be
For k = 1 To ncol1
Using Option Explicit would have saved a lot of hurt!