There is a famous and the fastest permutation code without any "function" for VB .Net to permutate numbers just in several rows, that I can't remember unfortunately.
Is there anyone know this code? Or know like this?
Some part of the code is here:
UPDATE: I FOUND IT. ALL THE WORKING CODES HERE:
Dim L(4) As Byte
Dim I As Byte
Dim K As Byte
Dim J As Byte
Dim RESULTS As String
Dim UB, UBm1 As Integer
L = {1, 2, 3, 4, 5}
UB = L.GetUpperBound(0)
UBm1 = UB - 1
Do
I = UBm1
Do While I > 0 And L(I) >= L(I + 1)
I -= 1
Loop
K = L(I)
J = UB
Do While J > 0 And L(J) <= K
J -= 1
Loop
RESULTS = L(0) & "," & L(1) & "," & L(2) & "," & L(3) & "," & L(4)
L(I) = L(J)
L(J) = K
Array.Reverse(L, I + 1, UB - I)
Loop While J
It seems like you're looking for this ...
http://www.cut-the-knot.org/do_you_know/AllPerm.shtml
(2. Lexicographic order and finding the next permutation)
... ?
In case you are, the starting values for I and J are incorrect (they should be 4 and 5 instead of 3 and 4 respectively).
(I know the example uses swap, but that can be replaced with a single colon-delimited line.)
Dim L(4) As Byte
L = {1, 2, 3, 4, 5}
Dim K as Byte
For N as integer = 1 to 120 'No. of permutations: 5!
Dim I As Byte = 4, J as Byte = 5
While L(I - 1) >= L(I)
I -= 1
End While
While L(J - 1) <= L(I - 1)
J -= 1
End While
K = L(I - 1) : L(I - 1) = L(J - 1) : L(J - 1) = K
I += 1 : J = 5
While I < J
K = L(I - 1) : L(I - 1) = L(J - 1) : L(J - 1) = K
I += 1 : J -= 1
End While
Dim RESULT as String = L(0) & "," & L(1) & "," & L(2) & "," & L(3) & "," & L(4)
'Export / print RESULT as you like, e.g. Console.WriteLine or something
Next
For obtaining a set of permutations of a natural number (recommended less than 10 though it can be bigger than that), this VBA Excel sub-routine is very fast and short. I believe it can be easy to convert it to VB.NET. Have a look.
Const P = 5 'Recommended Max P = 9
Dim m As Long, PArray(1 To 1000, 1 To P) 'Recommended Max 9! = 362880
Public Sub PermutationNaturalNumber()
Dim Q(1 To P) As Long
For m = 1 To P: Q(m) = m: Next: m = 0
PermutationGenerator P, Q
Range("A1").Resize(UBound(PArray), P) = PArray: End
End Sub
Sub PermutationGenerator(n As Long, Q() As Long)
Dim i As Long, j As Long, k As Long
If n > 1 Then
For i = 1 To n - 1
PermutationGenerator n - 1, Q
If n Mod 2 = 1 Then j = 1 Else j = i
k = Q(j): Q(j) = Q(n): Q(n) = k
Next
PermutationGenerator n - 1, Q
Else
m = m + 1: For i = 1 To P: PArray(m, i) = Q(i): Next
End If
End Sub
P can be any natural number, in this case {1, 2, 3, 4, 5}. Make sure you change the upper bound of PArray with P! meaning (P*(P-1)*(P-2)*...*1).
Related
I want to sort below Two-digit array by VBA code
A 1
B 2
A 1
C 3
or below:
1 A
2 B
1 A
3 C
I have tried to sort them by Dictionary, but, Dictionary is not allowed to insert duplate key.
Is there any want to sort above array by number 1,2,3
I made this some time ago, it might help.
Function ArraySorter(ByVal RecArray As Variant, Optional ByVal RefCol As Integer = 0) As Variant
Dim Menor As String
Dim NewArray() As Variant
Dim i As Double, j As Double
Dim menorIndex As Double
Dim NewArrayIndex() As Double
Dim UsedIndex() As Double
ReDim NewArrayIndex(UBound(RecArray, 2))
ReDim NewArray(UBound(RecArray), UBound(RecArray, 2))
For i = 0 To UBound(NewArrayIndex)
NewArrayIndex(i) = -1
Next i
UsedIndex = NewArrayIndex
For i = 0 To UBound(RecArray, 2)
Menor = ""
menorIndex = -1
For j = 0 To UBound(RecArray, 2)
If UsedIndex(j) = -1 Then
If Menor = "" Then
Menor = RecArray(RefCol, j)
menorIndex = j
Else
If RecArray(RefCol, j) < Menor Then
Menor = RecArray(RefCol, j)
menorIndex = j
End If
End If
End If
Next j
UsedIndex(menorIndex) = 1
NewArrayIndex(i) = menorIndex
Next i
For i = 0 To UBound(NewArrayIndex)
For j = 0 To UBound(NewArray)
NewArray(j, i) = RecArray(j, NewArrayIndex(i))
Next j
Next i
ArraySorter = NewArray
End Function
If you have something like:
Function testArraySorter()
Dim myArr() As Variant
ReDim myArr(1, 3)
myArr(0, 0) = "A"
myArr(0, 1) = "B"
myArr(0, 2) = "A"
myArr(0, 3) = "C"
myArr(1, 0) = 1
myArr(1, 1) = 2
myArr(1, 2) = 1
myArr(1, 3) = 3
myArr = ArraySorter(myArr)
For i = 0 To UBound(myArr, 2)
Debug.Print myArr(0, i), myArr(1, i)
Next i
End Function
you'll get this in your immediate verification :
A 1
A 1
B 2
C 3
If you need to sort based in two or more columns, you could add a dummy column into your array, concatenate the criteria columns into it and then set this dummy column as RefCol: myArr = ArraySorter(myArr, addedColNumberHere).
Hope this helps.
I've seen a couple similar questions, but nothing I've found quite hits the mark with me.
I have two tables on two different sheets. The tables themselves are structured identically, but hold different content. Each data cell within the table can have MULTIPLE lines of information (e.g. E5 is 5 lines (actual lines, not wrapped) of data).
I'd like to create a third table that is also structurally identical (row 1 and column A are the same) to the first two, but have each data cell contain in some meaningful syntax the difference between the same two cells in the first two tables. It could be as straightforward as the output of a diff between the contents of the first two cells.
Is there a way to essentially take the diff of sheet1!E5 and sheet2!E5 and store it in sheet3!E5 ?
EDIT:
Some clarification. All the data cells contain information on MULTIPLE lines. Some lines are present in one sheet but not in another. For instance, sheet1!E5 may contain "string1", "string2", and "string3" on THREE lines whereas sheet2!E5 contains "string2", "string3", "string7", and "string8" on FOUR lines. The output on sheet3!E5 could be anything that clearly marks the difference between the two, for instance (as was mentioned before), exactly what the output would be if the two cells were text files and you ran a diff (or diff -c) on them like this:
*** 1,3 ****
- string1
string2
! string3
\ No newline at end of file
--- 1,4 ----
string2
! string3
! string7
! string8
\ No newline at end of file
or even just something much simpler that lists the strings that are/are not in each cell like:
< string1
<> string2
<> string3
> string7
> string8
Using the code at https://en.wikipedia.org/wiki/Longest_common_subsequence_problem I made the following VBA code. It isn't verbatim because VBA doesn't allow short circuit operators and I needed PrintDiff to return the string instead of displaying it.
Function LCSLength(C() As Integer, X() As String, Y() As String, M As Integer, N As Integer) As Integer
Dim I As Integer
For I = 0 To M
C(I, LBound(Y)) = 0
Next
Dim J As Integer
For J = 0 To N
C(LBound(X), J) = 0
Next
For I = 1 To M
For J = 1 To N
If X(I) = Y(J) Then
C(I, J) = C(I - 1, J - 1) + 1
ElseIf C(I, J - 1) < C(I - 1, J) Then
C(I, J) = C(I - 1, J)
Else
C(I, J) = C(I, J - 1)
End If
Next
Next
LCSLength = C(M, N)
End Function
Function PrintDiff(C() As Integer, X() As String, Y() As String, I As Integer, J As Integer) As String
Continue = 1
If Continue = 1 And I > 0 And J > 0 Then
If X(I) = Y(J) Then
PrintDiff = PrintDiff(C, X, Y, I - 1, J - 1) & Chr(10) & "<> " & X(I)
Continue = 0
End If
End If
If Continue = 1 And J > 0 Then
If I = 0 Then
PrintDiff = PrintDiff(C, X, Y, I, J - 1) & Chr(10) & "> " & Y(J)
Continue = 0
ElseIf C(I, J - 1) >= C(I - 1, J) Then
PrintDiff = PrintDiff(C, X, Y, I, J - 1) & Chr(10) & "> " & Y(J)
Continue = 0
End If
End If
If Continue = 1 And I > 0 Then
If J = 0 Then
PrintDiff = PrintDiff(C, X, Y, I - 1, J) & Chr(10) & "< " & X(I)
Continue = 0
ElseIf C(I, J - 1) < C(I - 1, J) Then
PrintDiff = PrintDiff(C, X, Y, I - 1, J) & Chr(10) & "< " & X(I)
Continue = 0
End If
End If
If Continue = 1 Then
PrintDiff = ""
End If
End Function
Function Diff(A As String, B As String) As String
Dim X() As String
X = Split(Chr(10) & A, Chr(10))
Dim M As Integer
If (A = "") Then
M = 0
Else
M = UBound(X)
End If
Dim Y() As String
Y = Split(Chr(10) & B, Chr(10))
Dim N As Integer
If (B = "") Then
N = 0
Else
N = UBound(Y)
End If
Dim C() As Integer
ReDim C(M, N) As Integer
Call LCSLength(C, X, Y, M, N)
Diff = Mid(PrintDiff(C, X, Y, M, N), 2)
End Function
If A1 contains:
string1
string2
string3
And B1 contains:
string2
string3
string7
string8
And if C1 contains =Diff(A1,B1) then C1 will display:
< string1
<> string2
<> string3
> string7
> string8
According to the "Code optimization" section on that page several optimizations can be made to the code - I haven't done those in this code.
Yes. duplicate your structure in Sheet 3, and simply make sure all your references are accurate. For example, in cell A1 on sheet 3, you would just type:
= [navigate to sheet1!A1 and click it] - [navigate to sheet2!A1 and click it]
pretty easy, actually.
I want to create a recursive function in Excel VBA without using nested Loops. I used GoTo to do it as I think it is very fast as compared to For loop etc. PROBLEM: problem is that the first Label i.e. 'a' does not perform all iterations and the required combinations are not returned so. From the given array 'arr' there should be 39 combinations but just 14 are returned. I i try to change some lines of code the total iterations 'iNum' returns 39, but not 39 combinations (combinations starting from 'a' are always missing). Please help, Thanks.
Function rec_n()
Dim a As Integer, b As Integer, c As Integer
Dim aSize As Integer, iNum As Integer
Dim myStr As String
'Dim arr As Variant
Dim arr(5) As String
'arr = Array("a", "b", "c", "d")
arr(0) = "a"
arr(1) = "b"
arr(2) = "c"
'arr(3) = "d"
aSize = 3 - 1
'a = 0: b = 0: c = 0
a: If a < aSize Then
myStr = myStr & arr(a) & ", "
a = a + 1: iNum = iNum + 1
b: If b < aSize Then
myStr = myStr & arr(a) & arr(b) & ", "
b = b + 1: iNum = iNum + 1
c: If c < aSize Then
'On Error Resume Next
myStr = myStr & arr(a) & arr(b) & arr(c) & ", "
c = c + 1: iNum = iNum + 1
GoTo c
Else
c = 0
'MsgBox c
End If
GoTo b
Else
b = 0
'MsgBox b
End If
GoTo a
End If
EndFunc:
MsgBox iNum & vbLf & myStr
Range("a2").Value = myStr
End Function
EDITED:
The code is resulting in just these combinations:
a, ba, bba, bbb, bb, bca, bcb, b, ca, cba, cbb, cb, cca, ccb,
where as these 39 are expected:
a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, aab, aac, aba, abb,
abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, caa,
cab, cac, cba, cbb, cbc, cca, ccb, ccc,
Here is a goto-free recursive approach:
Function StringsFrom(A As Variant, Optional maxlen As Variant) As Variant
'returns a 0-based array of all strings of length <= maxlen
'with elements drawn from A
'A is assumed to be 0-based array
'If maxlen is missing then it is taken to be the number of elements in A
Dim strings As Variant
Dim newstrings As Variant
Dim i As Long, j As Long, k As Long, m As Long, n As Long
If IsMissing(maxlen) Then maxlen = 1 + UBound(A)
m = UBound(A)
If maxlen < 1 Then Exit Function
If maxlen = 1 Then
'basis case -- return a copy of A - coerced to be strings if needed
ReDim newstrings(0 To m)
For i = 0 To m
newstrings(i) = CStr(A(i))
Next i
Else
strings = StringsFrom(A, maxlen - 1)
n = UBound(strings)
ReDim newstrings(0 To n + (m + 1) ^ maxlen)
'first copy strings to newstrings:
For i = 0 To n
newstrings(i) = strings(i)
Next i
k = n + 1 'points to current index in newstrings
'now -- load up the rest using a nested loop:
For i = 0 To m
For j = n + 1 - (m + 1) ^ (maxlen - 1) To n
newstrings(k) = A(i) & strings(j)
k = k + 1
Next j
Next i
End If
StringsFrom = newstrings
End Function
For e.g. maxlen = 4 and A has 5 strings it will first find all strings of length <= maxlen - 1 = 3 and then tack on the characters to those strings of length exactly 3. I had to do a bit of arithmetic to get the indices just right.
Here is some testing code:
Sub test()
Dim start As Double, elapsed As Double, A As Variant, B As Variant
A = Array("a", "b", "c")
B = StringsFrom(A)
MsgBox Join(B, " ") & vbCrLf & 1 + UBound(B) & " strings"
A = Array("a", "b", "c", "d", "e", "f", "g")
start = Timer
B = StringsFrom(A)
elapsed = Timer - start
MsgBox Round(elapsed, 2) & " seconds to process " & 1 + UBound(B) & " strings"
End Sub
The first test correctly gives 3 + 9 + 27 = 39 strings, and the second test gives (on my machine) the message: "0.68 seconds to process 960799 strings". When I increase A more I run out of memory before time is an issue.
ON EDIT: Here is a non-recursive approach. It is slower than the recursive approach but isn't subject to out of memory problems. It is based on the idea that if e.g. your letters are "abc" then you can view e.g. strings of length 4 from those letters as being numbers in base 3 ( = Len("abc")) so to enumerate them just count from 0 to 3^4 -1 = 80, translate each number to base 3, then use the correspondence `0 <=> "a", 1 <=> "b" etc.):
Sub Enumerate(letters As String, maxlen As Long, Optional display As Boolean = True)
'letters is assumed to have no repeated characters
Dim i As Long, j As Long, n As Long, q As Long, r As Long
Dim counter As Long
Dim s As String
Dim A As Variant
n = Len(letters)
ReDim A(0 To n - 1)
For i = 1 To n
A(i - 1) = Mid(letters, i, 1)
Next i
For i = 1 To maxlen
For j = 0 To n ^ i - 1
s = ""
q = j
If q = 0 Then
s = A(0)
Else
Do While q > 0
r = q Mod n
q = Int(q / n)
s = A(r) & s
Loop
End If
s = String(i - Len(s), A(0)) & s
counter = counter + 1
If display Then Debug.Print s
Next j
Next i
Debug.Print counter
End Sub
Tested like:
Sub test2()
Dim start As Double, elapsed As Double
Enumerate "abc", 3
start = Timer
Enumerate "abcdefghijklmnopqrstuvwxyz", 5, False
elapsed = Timer - start
Debug.Print Round(elapsed, 2)
End Sub
output of the time part of the test: shows that it takes (on my machine) about 18 seconds to loop through all of the (over 12.3 million) strings from the lower case standard alphabet of length <= 5. Some improvements are possible, but you aren't going to be able to get the sorts of speed you would need to brute force long strings drawn from a large alphabet.
VBA is an interpreted language. I think of it as a good tool for putzing around the solar system. If you want to explore the galaxy -- use C. If you want to explore other galaxies -- hope that quantum computers can be made to work.
On Further Edit: For fun I wrote a different version of Enumerate. It is about 33% faster than the last version and can loop generate nearly a million strings per second (at least on my somewhat average laptop). It is still based on thinking of strings as numbers in base n = length(letters) but simulates adding 1 to go from 1 number to the next, with an array used to look up which character results from "adding one" to a letter:
Sub Enumerate2(letters As String, maxlen As Long, Optional display As Boolean = True)
'letters is assumed to have no repeated characters
'prints all letter combos of length <= maxlen
'this one simulates the process of adding one to a string
Dim i As Long, j As Long, k As Long, n As Long, p As Long
Dim carry As Boolean
Dim counter As Long
Dim s As String
Dim num As Variant
Dim Successor(127) As String, Z As String, digit As String
n = Len(letters)
For i = 1 To n - 1
Successor(Asc(Mid(letters, i, 1))) = Mid(letters, i + 1, 1)
Next i
Z = Mid(letters, 1, 1) 'the "zero" of the base-n system
Successor(Asc(Mid(letters, n, 1))) = Z
For i = 1 To maxlen
ReDim num(1 To i) 'used to count from 0 to n^i - 1 in base n
For k = 1 To i
num(k) = Z
Next k
For j = 0 To n ^ i - 1
'get current s
s = Join(num, "")
counter = counter + 1
'now add 1 to num
carry = True
p = i 'points to rightmost "digit"
Do While p > 0 And carry
digit = Successor(Asc(num(p)))
If digit <> Z Then carry = False
num(p) = digit
p = p - 1
Loop
'the real code would go here:
If display Then Debug.Print s
Next j
Next i
Debug.Print counter
End Sub
I'm looking for code in VBA to generate all subsets of the items in a passed array.
Below is simple code to select all N choose 2 subsets of array size N.
Looking to augment this for N choose (N-1)... all the way down to N choose 1.
Option Base 1
Sub nchoose2()
iarray = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
n = UBound(iarray)
x = 1
t = 0
r = 0
Do While (n - x) >= 1
For i = 1 To (n - x)
Cells((i + t), 1) = iarray(x)
Cells((i + t), 2) = iarray(i + x)
Next i
x = x + 1
t = t + (n - (1 + r))
r = r + 1
Loop
End Sub
In addition to the Gray-code algorithm, you can also exploit the correspondence between subsets of an n-element set and binary vectors of length n. The following code illustrates this approach:
Sub AddOne(binaryVector As Variant)
'adds one to an array consisting of 0s and 1s
'thought of as a binary number in little-endian
'the vector is modified in place
'all 1's wraps around to all 0's
Dim bit As Long, carry As Long, i As Long, n As Long
carry = 1
n = UBound(binaryVector)
i = LBound(binaryVector)
Do While carry = 1 And i <= n
bit = (binaryVector(i) + carry) Mod 2
binaryVector(i) = bit
i = i + 1
carry = IIf(bit = 0, 1, 0)
Loop
End Sub
Function listSubsets(items As Variant) As Variant
'returns a variant array of collections
Dim lb As Long, ub As Long, i As Long, j As Long, numSets As Long
Dim vect As Variant 'binary vector
Dim subsets As Variant
lb = LBound(items)
ub = UBound(items)
ReDim vect(lb To ub)
numSets = 2 ^ (1 + ub - lb)
ReDim subsets(1 To numSets)
For i = 1 To numSets
Set subsets(i) = New Collection
For j = lb To ub
If vect(j) = 1 Then subsets(i).Add items(j)
Next j
AddOne vect
Next i
listSubsets = subsets
End Function
Function showCollection(c As Variant) As String
Dim v As Variant
Dim i As Long, n As Long
n = c.Count
If n = 0 Then
showCollection = "{}"
Exit Function
End If
ReDim v(1 To n)
For i = 1 To n
v(i) = c(i)
Next i
showCollection = "{" & Join(v, ", ") & "}"
End Function
Sub test()
Dim stooges As Variant
Dim stoogeSets As Variant
Dim i As Long
stooges = Array("Larry", "Curly", "Moe")
stoogeSets = listSubsets(stooges)
For i = LBound(stoogeSets) To UBound(stoogeSets)
Debug.Print showCollection(stoogeSets(i))
Next i
End Sub
Running the code results in the following output:
{}
{Larry}
{Curly}
{Larry, Curly}
{Moe}
{Larry, Moe}
{Curly, Moe}
{Larry, Curly, Moe}
I asked a similar question a while back (2005) and received this excellent code from John Coleman:
Sub MAIN()
Dim i As Long, st As String
Dim a(1 To 12) As Integer
Dim ary
For i = 1 To 12
a(i) = i
Next i
st = ListSubsets(a)
ary = Split(st, vbCrLf)
For i = LBound(ary) To UBound(ary)
Cells(i + 1, 1) = ary(i)
Next i
End Sub
Function ListSubsets(Items As Variant) As String
Dim CodeVector() As Integer
Dim i As Integer
Dim lower As Integer, upper As Integer
Dim SubList As String
Dim NewSub As String
Dim done As Boolean
Dim OddStep As Boolean
OddStep = True
lower = LBound(Items)
upper = UBound(Items)
ReDim CodeVector(lower To upper) 'it starts all 0
Do Until done
'Add a new subset according to current contents
'of CodeVector
NewSub = ""
For i = lower To upper
If CodeVector(i) = 1 Then
If NewSub = "" Then
NewSub = Items(i)
Else
NewSub = NewSub & ", " & Items(i)
End If
End If
Next i
If NewSub = "" Then NewSub = "{}" 'empty set
SubList = SubList & vbCrLf & NewSub
'now update code vector
If OddStep Then
'just flip first bit
CodeVector(lower) = 1 - CodeVector(lower)
Else
'first locate first 1
i = lower
Do While CodeVector(i) <> 1
i = i + 1
Loop
'done if i = upper:
If i = upper Then
done = True
Else
'if not done then flip the *next* bit:
i = i + 1
CodeVector(i) = 1 - CodeVector(i)
End If
End If
OddStep = Not OddStep 'toggles between even and odd steps
Loop
ListSubsets = SubList
End Function
The original question and answer:
John Coleman
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have excel sheet with data which I want to get Levenshtein Distance between them. I already tried to export as text, read in from script (php), run Levenshtein (calculate Levenshtein Distance), save it to excel again.
But I am looking for a way to programatically calculate a Levenshtein Distance in VBA. How would I go about doing so?
Translated from Wikipedia :
Option Explicit
Public Function Levenshtein(s1 As String, s2 As String)
Dim i As Integer
Dim j As Integer
Dim l1 As Integer
Dim l2 As Integer
Dim d() As Integer
Dim min1 As Integer
Dim min2 As Integer
l1 = Len(s1)
l2 = Len(s2)
ReDim d(l1, l2)
For i = 0 To l1
d(i, 0) = i
Next
For j = 0 To l2
d(0, j) = j
Next
For i = 1 To l1
For j = 1 To l2
If Mid(s1, i, 1) = Mid(s2, j, 1) Then
d(i, j) = d(i - 1, j - 1)
Else
min1 = d(i - 1, j) + 1
min2 = d(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = d(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
d(i, j) = min1
End If
Next
Next
Levenshtein = d(l1, l2)
End Function
?Levenshtein("saturday","sunday")
3
Thanks to smirkingman for the nice code post. Here is an optimized version.
1) Use Asc(Mid$(s1, i, 1) instead. Numerical comparision is generally faster than text.
2) Use Mid$ istead of Mid since the later is the variant ver. and adding $ is string ver.
3) Use application function for min. (personal preference only)
4) Use Long instead of Integers since it's what excel natively uses.
Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long
string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)
For i = 0 To string1_length
distance(i, 0) = i
Next
For j = 0 To string2_length
distance(0, j) = j
Next
For i = 1 To string1_length
For j = 1 To string2_length
If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
distance(i, j) = distance(i - 1, j - 1)
Else
distance(i, j) = Application.WorksheetFunction.Min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
End If
Next
Next
Levenshtein = distance(string1_length, string2_length)
End Function
UPDATE:
For those who want it: I think it's safe to say that most people use Levenshtein distance to calculate fuzzy match percentages. Here's a way to do that, and I have added an optimization that you can specify the min. match % to return (default is 70%+. You enter percentags like "50" or "80", or "0" to run the formula regardless).
The speed boost comes from the fact that the function will check if it's even possible that it's within the percentage you give it by checking the length of the 2 strings. Please note there are some areas where this function can be optimized, but I have kept it at this for the sake of readability. I concatenated the distance in result for proof of functionality, but you can fiddle with it :)
Function FuzzyMatch(ByVal string1 As String, _
ByVal string2 As String, _
Optional min_percentage As Long = 70) As String
Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long, result As Long
string1_length = Len(string1)
string2_length = Len(string2)
' Check if not too long
If string1_length >= string2_length * (min_percentage / 100) Then
' Check if not too short
If string1_length <= string2_length * ((200 - min_percentage) / 100) Then
ReDim distance(string1_length, string2_length)
For i = 0 To string1_length: distance(i, 0) = i: Next
For j = 0 To string2_length: distance(0, j) = j: Next
For i = 1 To string1_length
For j = 1 To string2_length
If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
distance(i, j) = distance(i - 1, j - 1)
Else
distance(i, j) = Application.WorksheetFunction.Min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
End If
Next
Next
result = distance(string1_length, string2_length) 'The distance
End If
End If
If result <> 0 Then
FuzzyMatch = (CLng((100 - ((result / string1_length) * 100)))) & _
"% (" & result & ")" 'Convert to percentage
Else
FuzzyMatch = "Not a match"
End If
End Function
Use a byte array for 17x speed gain
Option Explicit
Public Declare Function GetTickCount Lib "kernel32" () As Long
Sub test()
Dim s1 As String, s2 As String, lTime As Long, i As Long
s1 = Space(100)
s2 = String(100, "a")
lTime = GetTickCount
For i = 1 To 100
LevenshteinStrings s1, s2 ' the original fn from Wikibooks and Stackoverflow
Next
Debug.Print GetTickCount - lTime; " ms" ' 3900 ms for all diff
lTime = GetTickCount
For i = 1 To 100
Levenshtein s1, s2
Next
Debug.Print GetTickCount - lTime; " ms" ' 234 ms
End Sub
'Option Base 0 assumed
'POB: fn with byte array is 17 times faster
Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long, bs1() As Byte, bs2() As Byte
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long
Dim min1 As Long, min2 As Long, min3 As Long
string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)
bs1 = string1
bs2 = string2
For i = 0 To string1_length
distance(i, 0) = i
Next
For j = 0 To string2_length
distance(0, j) = j
Next
For i = 1 To string1_length
For j = 1 To string2_length
'slow way: If Mid$(string1, i, 1) = Mid$(string2, j, 1) Then
If bs1((i - 1) * 2) = bs2((j - 1) * 2) Then ' *2 because Unicode every 2nd byte is 0
distance(i, j) = distance(i - 1, j - 1)
Else
'distance(i, j) = Application.WorksheetFunction.Min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
' spell it out, 50 times faster than worksheetfunction.min
min1 = distance(i - 1, j) + 1
min2 = distance(i, j - 1) + 1
min3 = distance(i - 1, j - 1) + 1
If min1 <= min2 And min1 <= min3 Then
distance(i, j) = min1
ElseIf min2 <= min1 And min2 <= min3 Then
distance(i, j) = min2
Else
distance(i, j) = min3
End If
End If
Next
Next
Levenshtein = distance(string1_length, string2_length)
End Function
I think it got even faster... Didn't do much other than improve previous code for speed and results as %
' Levenshtein3 tweaked for UTLIMATE speed and CORRECT results
' Solution based on Longs
' Intermediate arrays holding Asc()make difference
' even Fixed length Arrays have impact on speed (small indeed)
' Levenshtein version 3 will return correct percentage
'
Function Levenshtein3(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long, string1_length As Long, string2_length As Long
Dim distance(0 To 60, 0 To 50) As Long, smStr1(1 To 60) As Long, smStr2(1 To 50) As Long
Dim min1 As Long, min2 As Long, min3 As Long, minmin As Long, MaxL As Long
string1_length = Len(string1): string2_length = Len(string2)
distance(0, 0) = 0
For i = 1 To string1_length: distance(i, 0) = i: smStr1(i) = Asc(LCase(Mid$(string1, i, 1))): Next
For j = 1 To string2_length: distance(0, j) = j: smStr2(j) = Asc(LCase(Mid$(string2, j, 1))): Next
For i = 1 To string1_length
For j = 1 To string2_length
If smStr1(i) = smStr2(j) Then
distance(i, j) = distance(i - 1, j - 1)
Else
min1 = distance(i - 1, j) + 1
min2 = distance(i, j - 1) + 1
min3 = distance(i - 1, j - 1) + 1
If min2 < min1 Then
If min2 < min3 Then minmin = min2 Else minmin = min3
Else
If min1 < min3 Then minmin = min1 Else minmin = min3
End If
distance(i, j) = minmin
End If
Next
Next
' Levenshtein3 will properly return a percent match (100%=exact) based on similarities and Lengths etc...
MaxL = string1_length: If string2_length > MaxL Then MaxL = string2_length
Levenshtein3 = 100 - CLng((distance(string1_length, string2_length) * 100) / MaxL)
End Function