On making MATCH function like FIND function - vba

I'm trying to make MATCH function work like FIND function. First of all, I generate the dummy data to be use for testing. Here is the routine I use:
Sub Data_Generator()
Randomize
Dim Data(1 To 100000, 1 To 1)
Dim p As Single
For i = 1 To 100000
p = Rnd()
If p < 0.4 Then
Data(i, 1) = "A"
ElseIf p >= 0.4 And p <= 0.7 Then
Data(i, 1) = "B"
Else
Data(i, 1) = "C"
End If
Next i
Range("A1:A100000") = Data
End Sub
Now, I create a sub-routine to find the string A in the range Data. There are two methods I use here that employ MATCH function. The first method is to reset the range of lookup array like the following code:
Sub Find_Match_1()
T0 = Timer
Dim i As Long, j As Long, k As Long, Data As Range
Dim Output(1 To 100000, 1 To 1)
On Error GoTo Finish
Do
Set Data = Range(Cells(j + 1, 1), "A100000") 'Reset the range of lookup array
i = WorksheetFunction.Match("A", Data, 0)
j = j + i
Output(j, 1) = j 'Label the position of A
k = k + 1 'Counting the number of [A] found
Loop
Finish:
Range("B1:B100000") = Output
InputBox "The number of [A] found are " & k & " in", "Process is complete", Timer - T0
End Sub
And for the second method, I assign the cell of range where A is located by value vbNullString instead of resetting Range("A1:A100000"). The idea is to delete the string A after being found and to expect MATCH function to find the next string A in the Range("A1:A100000"). Here is the code to implement the second method:
Sub Find_Match_2()
T0 = Timer
Dim n As Long, i As Long, j As Long
Dim Data_Store()
Dim Output(1 To 100000, 1 To 1)
Data_Store = Range("A1:A100000")
On Error GoTo Finish
Do
j = WorksheetFunction.Match("A", Range("A1:A100000"), 0)
Output(j, 1) = j
Cells(j, 1) = vbNullString
n = n + 1
Loop
Finish:
Range("A1:A100000") = Data_Store
Range("B1:B100000") = Output
InputBox "The number of [A] found are " & n & " in", "Process is complete", Timer - T0
End Sub
The goal is to determine which method is better at employing MATCH function in its performance. It turns out the first method only completes less than 0.4 seconds meanwhile the second method completes about a minute on my PC. So my questions are:
Why does the second method take time too long to complete?
How does one improve the performance of the second method?
Can MATCH function be used in an array?

I agree that this is more of a Code Review question, but I chose to look into it for my own curiosity, so I'll share what I found.
I think you're hitting a very classic case of N vs N^2 computational complexity. Look at your two methods, which seem remarkably similar, and consider what they're actually doing, keeping in mind that the MATCH function is probably just a linear search when you use Match_type=0 (because your data is unsorted, whereas other match types could do a binary search on your sorted data).
Method 1:
Start at A1
Continue down the range until an "A" is found
Restart at the cell below the MATCH
Method 2:
Start at A1
Continue down the range until an "A" is found
Clear the "A"
Restart at A1
It should be instantly apparent that while one method is continually shrinking the range it searches, the other is always starting at the first cell and searching the whole range. This will account for some of the speedup, and already boosts Method 1 to a nice lead, but it's not even nearly the full story.
The real key lies in the amount of work Match has to do for each situation. Because its range constantly shrinks and moves its start further down the list, whichever cell Method 1's Match starts from, it only has to search a small number of cells before it hits an A and resumes the outer loop. Meanwhile, Method 2 is continually destroying A's, making them less and less dense and forcing itself to search more and more of the range before getting any hits. By the end, Method 2 is looping through almost 100,000 empty cells/B's/C's before finding its next A.
So on average, the Match for Method 1 is only looking through a couple of cells each time, while the Match for Method 2 is taking longer and longer as time goes on, until the end when it is forced to loop through the entire range. On top of that, Method 2 is doing a bunch of writes to cell values, which is slower than you might think when you have to do it tens of thousands of times.
In all honesty, your best bet would be to just loop through the cells yourself once, looking for A's and handling them as you go. MATCH brings no advantage to the table, and Method 1 is basically just a more complicated version of the loop I described.
I'd write this something like:
Sub Find_Match_3()
T0 = Timer
Dim k As Long, r As Range
Dim Output(1 To 100000, 1 To 1)
For Each r In Range("A1:A100000").Cells
If r.Value = "A" Then
Output(r.Row, 1) = r.Row 'Label the position of A
k = k + 1 'Counting the number of [A] found
End If
Next
Range("B1:B100000") = Output
InputBox "The number of [A] found are " & k & " in", "Process is complete", Timer - T0
End Sub
Which is about 30% faster on my machine.

Related

VBA : extensive use of global variables for referencing columns?

Context
I have a big macro where I have declared lots of global constant in a dedicated module (i.e, a module which contains only Public Const declarations).
About 100 of those global constants are used to assign columns names to each column of my main Data worksheet :
Public Const columnName = "A"
Public Const columnCity = "B"
Public Const columnPhone = "C"
...
Public Const columnColor = "CX"
This let me reference to the columns (of my Data worksheet, from the 10 other modules) using .Range(columnColor & l) instead of using .Range("CX" & l) (where l is obviously the row number) . This is much easier to code (I don't need to search for the right column) or to update if I decide to insert a column before "F" (I only have to update my const module and not the 10 other code modules).
However, it looks like using .Range(columnCity & l) is notably slower than using .Range("A" & l). (SEE EDIT BELOW)
The most processors intensive tasks are done using big 2D arrays. But I'm still probably calling those global column variables 100 000 times in some subs, since I'm not only checking/updating values/formulas (which I could do on a 2D array) but also dealing with cell's .Interior.Color, .Comment.Text ...
Question
How bad an idea is it to use such global variables (Public Const columnName...) to reference columns ?
Is there some standard way of doing so ?
Edit
As pointed by Tim, I think I indeed spent time changing every .Cells(l, 1) to .Range(columnName & 1) when I refactored my code to use the column variable. That means that :
My problem probably comes from using .Range vs .Cells rather than from the global variables.
I should probably refactor back to .Cells(l, colIndexName).
There is no "standard" way to do this, so you should run some performance tests and figure out the most efficient method if performance is an issue for you.
For example, using a numeric constant and Cells() seems to be about twice as fast as using Range():
Option Explicit
Public Const columnName As String = "A"
Public Const colIndexName As Long = 1
Sub Tester()
Dim l As Long, v, t
t = Timer
With Sheet1
For l = 1 To 300000#
v = .Range(columnName & 1).Value
Next l
End With
Debug.Print Timer - t '>> approx. 1.3 sec
t = Timer
With Sheet1
For l = 1 To 300000#
v = .Cells(1, colIndexName).Value
Next l
End With
Debug.Print Timer - t '>> approx. 0.6 sec
End Sub
However, it's likely only twice as fast if that's all you're doing - as soon as you add in other tasks that difference may wash out.
Need an Answer to provide formatting and results, though this is more of a comment.
I have found no significant difference between .Range(columnCity & l) and .Range("A" & l). Can you provide more insight on how you came to this conclusion?
Here is my code for speed comparison:
Public Const p_sCol As String = "A"
Sub tgr()
Dim ws As Worksheet
Dim i As Long, l As Long
Dim sTemp As String
Dim dTimer As Double
Dim aResults(1 To 1, 1 To 2) As Double
Set ws = ActiveWorkbook.ActiveSheet
l = 1
dTimer = Timer
For i = 1 To 100000
sTemp = vbNullString
sTemp = ws.Range(p_sCol & l).Value
Next i
aResults(1, 1) = Timer - dTimer
dTimer = Timer
For i = 1 To 100000
sTemp = vbNullString
sTemp = ws.Range("A" & l).Value
Next i
aResults(1, 2) = Timer - dTimer
ws.Range("C1:D1").Value = aResults
End Sub
I ran the test 10 times, and the average result for the public variable concatenation over 100,000 iterations was 0.4375 seconds while the average result for hard coding the column letter was 0.429688 seconds, which is a difference of 0.007813 seconds. Sure, the hard coded method was slightly faster, but not noticeably and certainly not significantly.

VBA: adding random numbers to a grid that arent already in the grid

Sub FWP()
Dim i As Integer
Dim j As Integer
Dim n As Integer
n = Range("A1").Value
For i = 1 To n
For j = 1 To n
If Cells(i + 1, j) = 0 Then
Cells(i + 1, j).Value = Int(((n ^ 2) - 1 + 1) * Rnd + 1)
ElseIf Cells(i + 1, j) <> 0 Then
Cells(i + 1, j).Value = Cells(i + 1, j).Value
End If
Next j
Next i
I am trying to do a part of a homework question that asks to fill in missing spaces in a magic square in VBA. It is set up as a (n x n) matrix with n^2 numbers in; the spaces I need to fill are represented by zeros in the matrix. So far I have some code that goes through checking each individual cell value, and will leave the values alone if not 0, and if the value is 0, it replaces them with a random number between 1 and n^2. The issue is that obviously I'm getting some duplicate values, which isn't allowed, there must be only 1 of each number.
How do I code it so that there will be no duplicate numbers appearing in the grid?
I am attempting to put in a check function to see if they are already in the grid but am not sure how to do it
Thanks
There are a lot of approaches you can take, but #CMArg is right in saying that an array or dictionary is a good way of ensuring that you don't have duplicates.
What you want to avoid is a scenario where each cell takes progressively longer to populate. It isn't a problem for a very small square (e.g. 10x10), but very large squares can get ugly. (If your range is 1-100, and all numbers except 31 are already in the table, it's going to take a long time--100 guesses on average, right?--to pull the one unused number. If the range is 1-40000 (200x200), it will take 40000 guesses to fill the last cell.)
So instead of keeping a list of numbers that have already been used, think about how you can effectively go through and "cross-off" the already used numbers, so that each new cell takes exactly 1 "guess" to populate.
Here's one way you might implement it:
Class: SingleRandoms
Option Explicit
Private mUnusedValues As Scripting.Dictionary
Private mUsedValues As Scripting.Dictionary
Private Sub Class_Initialize()
Set mUnusedValues = New Scripting.Dictionary
Set mUsedValues = New Scripting.Dictionary
End Sub
Public Sub GenerateRange(minimumNumber As Long, maximumNumber As Long)
Dim i As Long
With mUnusedValues
.RemoveAll
For i = minimumNumber To maximumNumber
.Add i, i
Next
End With
End Sub
Public Function GetRandom() As Long
Dim i As Long, keyID As Long
Randomize timer
With mUnusedValues
i = .Count
keyID = Int(Rnd * i)
GetRandom = .Keys(keyID)
.Remove GetRandom
End With
mUsedValues.Add GetRandom, GetRandom
End Function
Public Property Get AvailableValues() As Scripting.Dictionary
Set AvailableValues = mUnusedValues
End Property
Public Property Get UsedValues() As Scripting.Dictionary
Set UsedValues = mUsedValues
End Property
Example of the class in action:
Public Sub getRandoms()
Dim r As SingleRandoms
Set r = New SingleRandoms
With r
.GenerateRange 1, 100
Do Until .AvailableValues.Count = 0
Debug.Print .GetRandom()
Loop
End With
End Sub
Using a collection would actually be more memory efficient and faster than using a dictionary, but the dictionary makes it easier to validate that it's doing what it's supposed to do (since you can use .Exists, etc.).
Nobody is going to do your homework for you. You would only be cheating yourself. Shame on them if they do.
I'm not sure how picky your teacher is, but there are many ways to solve this.
You can put the values of the matrix into an array.
Check if a zero value element exists, if not, break.
Then obtain your potential random number for insertion.
Iterate through the array with a for loop checking each element for this value. If it is not present, replace the zero element.

Lee-Ready tick test using VBA

I am trying to build Lee-Ready tick test for estimating trade direction from tick data using Excel. I have a dataset containing the trade prices in descending order, and I am trying to build a VBA code that is able to loop over all the 4m+ cells in as efficient manner as possible.
The rule for estimating trade direciton goes as follows:
If Pt>Pt-1, then d=1
If Pt<Pt-1, then d=-1
If Pt=Pt-1, then d is the last value taken by d.
So to give a concrete example, I would like to transform this:
P1;P2;P3;P4
1.01;2.02;3.03;4.04
1.00;2.03;3.03;4.02
1.01;2.02;3.01;4.04
1.00;2.03;3.00;4.04
into this
d1;d2;d3;d4
1;-1;1;1
-1;1;1;-1
1;-1;1;0
0;0;0;0
Fairly straightforward nested loops suffice:
Function LeeReady(Prices As Variant) As Variant
'Given a range or 1-based, 2-dimensional variant array
'Returns an array of same size
'consisiting of an array of the same size
'of trade directions computed according to
'Lee-Ready rule
Dim i As Long, j As Long
Dim m As Long, n As Long
Dim priceData As Variant, directions As Variant
Dim current As Variant, previous As Variant
If TypeName(Prices) = "Range" Then
priceData = Prices.Value
Else
priceData = Prices
End If
m = UBound(priceData, 1)
n = UBound(priceData, 2)
ReDim directions(1 To m, 1 To n) As Long 'implicitly fills in bottom row with 0s
For i = m - 1 To 1 Step -1
For j = 1 To n
current = priceData(i, j)
previous = priceData(i + 1, j)
If current > previous Then
directions(i, j) = 1
ElseIf current < previous And previous > 0 Then
directions(i, j) = -1
Else
directions(i, j) = directions(i + 1, j)
End If
Next j
Next i
LeeReady = directions
End Function
This can be called from a sub or used directly on the worksheet:
Here I just highlighted a block of cells of the correct size to hold the output and then used the formula =LeeReady(A2:D5) (pressing Ctrl+Shift+Enter to accept it as an array formula).
On Edit: I modified the code slightly (by adding the clause And previous > 0 to the If statement in the main loop) so that it can now handle ranges in which come of the columns have more rows than other columns. The code assumes that price data is always > 0 and fills in the return array with 0s as place holders in the columns that end earlier than other columns:

Get maximum number of characters that a cell can contain

Excel specifications and limits says:
Total number of characters that a cell can contain: 32,767 characters
Is there a way to get this number programatically?
I'm asking because hardcoding constants should, in general, be avoided if and when feasible. This number may conceivably change by Office version (It hasn't changed between 2003 and 2013, but who knows what Microsoft has in store for us).
It's pretty easy to get the maximum number of rows in a worksheet:
Sheet1.Rows.Count ' returns 65,536 in Office 2003 and 1,048,576 in Office 2007-2013
but apparently, getting the maximum number of characters that a cell can contain isn't as straightforward.
Note that writing too many characters to a cell will not result in an error; it will silently fail and truncate the string — so proper error handling isn't an option here.
In a loop, append characters one by one to the cell contents. Each time, read cell contents, check if the last character added is present. If it isn't then that's the limit.
Upside: Works and is 100% reliable.
Downside: Really slow. It takes 10-15 seconds to complete, due to the many read-writes to/from sheet.
Obviously, this could be optimised by using a good guess (e.g. 32,767) as the initial condition, and using a hunt & bisect search algorithm rather than incrementing by 1. However if the answer is far enough away from the initial guess, this might still take ~1 second to run — not something you would want to call repeatedly.
Function MaximumNumberOfCharactersACellCanContain(r As Range)
'NB: Range r will be overwritten.
Dim sIn As String
Dim sOut As String
Dim i As Long
Application.ScreenUpdating = False
Do
i = i + 1
sIn = sIn & Chr(97 + (i - 1) Mod 26)
r.Cells(1, 1).Value = sIn
sOut = r.Cells(1, 1).Value
If Right(sOut, 1) <> Right(sIn, 1) Then Exit Do
'If Len(sOut) <> Len(sIn) Then Exit Do
Loop
Application.ScreenUpdating = True
MaximumNumberOfCharactersACellCanContain = i - 1
End Function
Example usage:
MsgBox MaximumNumberOfCharactersACellCanContain(Range("A1"))
Alternative: Loop appending a chunk until the assigned length is not whats expected
Const INT_MAX As Integer = 32767
Dim i As Long
ActiveCell.Value = ""
Dim buff As String: buff = Space$(INT_MAX)
Do
i = i + 1
ActiveCell.Value = ActiveCell.Value & buff
If Len(ActiveCell.Value) <> (i * INT_MAX) Then
MaxLen = Len(ActiveCell.Value)
Exit Function
End If
Loop
Or even
ActiveCell.Value = Space$(A_BIG_NUMBER)
MaxLen = Len(ActiveCell.Value)
Here's a variant where we take exponential steps (larger and larger steps whose size increases by a factor stepFactor each time).
Function MaximumNumberOfCharactersACellCanContain(r As Range, _
Optional ByVal stepFactor As Double = 2)
Dim n As Double
Dim nActual As Long
Dim l As Long
n = 1
Do
n = n * stepFactor
nActual = CLng(n)
r.Cells(1, 1).Value = Space$(nActual)
l = Len(r.Cells(1, 1).Value)
If l <> nActual Then
MaximumNumberOfCharactersACellCanContain = l
Exit Function
End If
Loop
End Function
Example usage:
Debug.Print MaximumNumberOfCharactersACellCanContain(Range("A1"), 8)
The choice of stepFactor is a compromise between:
Reducing the number of iterations (larger factor is better), and
Limiting down the cost of the last iteration (the one that fails). If stepFactor is too large, then you're writing a very long string to the cell and this is quite slow.
Making sure the last iteration will never hit the out of memory ceiling (~130 million characters on my system). (Could add error handling do deal with this eventuality.)
stepFactor somewhere between 2 and 8 should be robust and quick.

VBA Filter Function for dynamic array doesn't seem to be filtering on occasion

I am writing a subroutine in VBA to cycle through all the listed job numbers in a multi-tab time sheet and create a list of all job numbers that have been used (so it takes the original list (with possibly multiple job number occurrences) and creates a list with only one occurrence of each job number. The job numbers on each sheet are found in range("A8:A30"). The code below seems to work for the first several job names on the sample that I'm testing, but then seems to stop filtering. A8:A21 of the first sheet is:
14GCI393
14GCI393
13GCI373
13GCI373
13GCI388
13GCI367:2
14GCI408
14GCI408
13GCI373
13GCI388
14GCI415
14GCI415
00GCI000
And the code is:
Sub listusedjobs()
Dim usedjobs() As String
Dim nextjob As String
Dim i, m, n, lastsheetindexnumber As Integer
Application.ScreenUpdating = False
lastsheetindexnumber = ThisWorkbook.Sheets.Count
m = 0
ReDim usedjobs(m)
usedjobs(m) = "initialize"
For i = 1 To lastsheetindexnumber
Sheets(i).Activate
For n = 8 To 30
nextjob = Range("A" & n).Value
If Not IsInArray(nextjob, usedjobs) Then 'determine if nextjob is already in usedjobs()
ReDim usedjobs(m)
usedjobs(m) = nextjob 'Add each unique job to array "usedjobs"
Sheets(lastsheetindexnumber).Cells(m + 40, 1).Value = nextjob 'Print job name that was just added
m = m + 1
End If
Next n
Next i
Application.ScreenUpdating = True
End Sub
Function IsInArray(stringToBeFound As String, arr As Variant) As Boolean
IsInArray = (UBound(Filter(arr, stringToBeFound, , vbTextCompare)) > -1)
End Function
Any help figuring out what is going wrong will be much appreciated! The current output I get for this code is below and contains multiple doubles.
14GCI393
13GCI373
13GCI388
13GCI367:2
14GCI408
13GCI373
13GCI388
14GCI415
00GCI000
I think that your problem may be not using ReDim Preserve inside your If Not