In VBA, it's known that the For Each loop will iterate through a Collection faster than the For loop, with the difference in time between the two looping methods increasing exponentially(?) as a function of the Collection size. (This assumes that iteration is "in order" over the Collection members of course.)
Why is it faster though? How can the For Each loop access items faster than via their indices in a Collection?
I faced the same problem recently with a collection of instances of some class, and, indeed, for each is further faster than for. I ran some tests to know why and what I found is that the extra delay using for is due find the specified item among all itens in collection.
The test:
I create a simple class containing just an integer as public field and nothing more. Then, was created 100,000 instances of this class and added in a collection
Dim mClass As cls
Dim col As New Collection
For i = 1 To 100000
Set mClass = New cls
mClass.a = 1
Call col.Add(mClass)
Set mClass = Nothing
Next i
then I measured the time to run each one of the following block of code
1) For each (time ~0.95s)
For Each obj In col
obj.a = 1
Next
2a) For through all elements of collection (time ~95s)
For i = 1 To 100000
Set mClass = col(i)
mClass.a = 1
Set mClass= Nothing
Next i
2b) For only the first element (time ~0.15s)
For i = 1 To 100000
Set mClass = col(1)
mClass.a = 1
Set mClass = Nothing
Next i
2c) For only the last element(time ~180s)
For i = 1 To 100000
Set mClass = col(100000)
mClass.a = 1
Set mClass = Nothing
Next i
3) allocating and deallocating memory (time ~0.72s)
For i = 1 To 100000
Set mClass = New cls
mClass.a = 1
Set mClass = Nothing
Next i
4) using another instance out of collection (time ~0.05s)
Dim mclass2 As cls
Set mclass2 = New cls
For i = 1 To 100000
Set mClass = mclass2
mClass.a = 1
Set mClass = Nothing
Next i
The time it takes works more in relative basis than absolute one and give us an idea where time is spended.
Your presumption that For Each is faster than For 1 To is wrong. Both loop through a set of numbers which they can do at identical speed.
In the For Each variant the number identifies objects in a collection, such as Sheets(1), Sheets(2), Sheets(3) etc. These sheets aren't loaded or even accessed in the loop. They are just referenced.
By comparison, For i = 1 to 3: Set Ws = Sheets(i) would just create references to the sheets.
Accordingly, a difference develops from what you do with the referenced objects, not from the way you reference them. For Each often appears as the somewhat simpler code. But if you wish to refer to ActiveSheet.Cells(3) you do need to know whether this will be C1 or A3 and the apparent greater ease of coding comes at the cost of transparency. I treat it as a matter of taste.
Dim Arr As Variant
Dim R As Long
Arr = Range("A1:A20")
For R = 1 To UBound(Arr)
Debug.Print Arr(R, 1)
Next R
is much faster than
Dim Rng As Range
Dim Cell As Range
Set Rng = Range("A1:A20")
For Each Cell in Rng
Debug.Print Cell.Value
Next Cell
But this is because the second code references the sheet 20 times against the first snippet's once. Perhaps it's this difference that you have been reading about.
Related
So I have a really huge excel spreadsheet with about 100 thousands rows. Each row represents a client. Each column stores that client's name, debt, probability of default and so on. There are about 12 columns.
In my code, I loop through the rows and store the data for each client as a class object, then add that object to a collection:
Function getClients(dataWorkbook As Workbook)
Dim resultColl As Collection
Set resultColl = New Collection
...
With dataWorkbook.Worksheets(globals("DATA_SHEET"))
For i = firstRow To lastRow
Set clientCopy = New Client
clientCopy.setClientName = .Cells(i, column_names).value
clientCopy.setContractNumber = .Cells(i, column_contract_numbers).value
...
resultColl.Add clientCopy
Next
End With
Set getClients = resultColl
End Function
Then for each client in the collection I calculate the random numbers and store them in a collection for that client (the collection of size N, depending the number a user wants, usually 1000 - 5000).
Then I go through those random numbers of each client and store an outcome in an outcomes collection, depending on the random number's value (if the number is greater than the client's number P, then I store 0, else 1).
Then I calculate a financial result for each client, meaning, I check for each outcome in the outcomes collection and store appropriate values in losses and profits collections, depending if the outcome is 1 or 0.
These methods inside the Client class:
Public Sub generateRandoms()
Set randomNumbers = New Collection
Dim i As Long
For i = 1 To simulationCount
randomNumbers.Add Rnd()
'Debug.Print "random: " & randomNumbers(i)
Next
End Sub
Public Sub calculateOutcomes()
Set outcomes = New Collection
Dim i As Long
If totalPd <> -1 Then
For i = 1 To simulationCount
If randomNumbers(i) < totalPd Then
outcomes.Add 1
Else
outcomes.Add 0
End If
Next
Else
For i = 1 To simulationCount
outcomes.Add Null
Next
End If
End Sub
Public Sub calculateFinancialResult()
Set losses = New Collection
Set profits = New Collection
Dim i As Long
For i = 1 To outcomes.Count
If outcomes(i) = 1 Then
losses.Add totalLoss
profits.Add 0
ElseIf outcomes(i) = 0 Then
losses.Add 0
profits.Add totalProfit
Else
losses.Add Null
profits.Add Null
End If
Next
End Sub
The code in the main module goes like this:
Dim clientsColl As Collection
Set clientsColl = getClients(dataWorkbook)
Dim clientCopy As Client
For Each clientCopy In clientsColl
clientCopy.setSimulationCount = globals("SIMULATION_COUNT")
clientCopy.generateRandoms 'up to this line it's fast
'clientCopy.calculateOutcomes 'this line is slow, so i have to shut excel down
'clientCopy.calculateFinancialResult
clientCopy.clearSums
Next
MsgBox ("Done for")
While the data collection from the data workbook is rather slow, it takes a reasonable enough amount of time. At the same time, the calculations itself are extraordinary slow. I tried to figure out which function or sub takes the most time. So far it looks like calculateOutcomes is pretty slow, while generateRandoms is quite fast and takes less than a minute.
I don't understand why is that so - both of them loop through each client and store data in a collection N times. At the same time the first is fast, while the second is slow. If I don't terminate the excel eventually I encounter an Out of memory error. Why?
Is there anything I can do about this code's performance? Any suggestions are appreciated. What are the general practices in situations like this?
I have a macro that so far, adds 4 new table columns to an existing table ("Table1"). Now, I would like the macro to format the 3rd and 4th row as percentage. I would like to include this in the loop already listed in my code. I have tried several different ways to do this. I don't think I quite understand how the UBound function works, but hopefully you can understand what I am trying to do.
I also am unsure if I am allowed to continue to utilize the WITH statement in my nested For loop in regards to me 'lst' variable.
#Jeeped - I'm looking at you for this one again...thanks for basically walking me through this whole project lol
Sub attStatPivInsertTableColumns_2()
Dim lst As ListObject
Dim currentSht As Worksheet
Dim colNames As Variant, r1c1s As Variant
Dim h As Integer, i As Integer
Set currentSht = ActiveWorkbook.Sheets("Sheet1")
Set lst = ActiveSheet.ListObjects("Table1")
colNames = Array("AHT", "Target AHT", "Transfers", "Target Transfers")
r1c1s = Array("=([#[Inbound Talk Time (Seconds)]]+[#[Inbound Hold Time (Seconds)]]+[#[Inbound Wrap Time (Seconds)]])/[#[Calls Handled]]", "=350", "=[#[Call Transfers and/or Conferences]]/[#[Calls Handled]]", "=0.15")
With lst
For h = LBound(colNames) To UBound(r1c1s)
.ListColumns.Add
.ListColumns(.ListColumns.Count).Name = colNames(h)
.ListColumns(.ListColumns.Count).DataBodyRange.FormulaR1C1 = r1c1s(h)
If UBound(colNames(h)) = 2 or UBound(colNames(h)) = 3 Then
For i = UBound(colNames(h), 2) To UBound(colNames(h), 3)
.ListColumns(.ListColumns.Count).NumberFormat = "0%"
End if
Next i
Next h
End With
End Sub
You don't need to nest a second for loop. If you want to set the 3rd and 4th columns to a percentage, you only need to set that when the iteration of the loop (h) is 2 or 3 (remembering that arrays index from 0). You also shouldn't cross arrays for the main loop, and since LBound is in most cases 0 you might as well just use that anyway. Try this:
With lst
For h = 0 To UBound(r1c1s)
.ListColumns.Add
.ListColumns(.ListColumns.Count).Name = colNames(h)
.ListColumns(.ListColumns.Count).DataBodyRange.FormulaR1C1 = r1c1s(h)
If h = 2 or h = 3 Then
.ListColumns(.ListColumns.Count).NumberFormat = "0%"
End if
Next h
End With
To answer the other point in your question, UBound(array) just gives the index of the largest element (the Upper BOUNDary) in the given array. So where you have 50 elements in such an array, UBound(array) will return 49 (zero based as mentioned before). LBound just gives the other end of the array (the Lower BOUNDary), which is generally zero.
I have worked in Python earlier where it is really smooth to have a dictionary of lists (i.e. one key corresponds to a list of stuff). I am struggling to achieve the same in vba. Say I have the following data in an excel sheet:
Flanged_connections 6
Flanged_connections 8
Flanged_connections 10
Instrument Pressure
Instrument Temperature
Instrument Bridle
Instrument Others
Piping 1
Piping 2
Piping 3
Now I want to read the data and store it in a dictionary where the keys are Flanged_connections, Instrument and Piping and the values are the corresponding ones in the second column. I want the data to look like this:
'key' 'values':
'Flanged_connections' '[6 8 10]'
'Instrument' '["Pressure" "Temperature" "Bridle" "Others"]'
'Piping' '[1 2 3]'
and then being able to get the list by doing dict.Item("Piping") with the list [1 2 3] as the result. So I started thinking doing something like:
For Each row In inputRange.Rows
If Not equipmentDictionary.Exists(row.Cells(equipmentCol).Text) Then
equipmentDictionary.Add row.Cells(equipmentCol).Text, <INSERT NEW LIST>
Else
equipmentDictionary.Add row.Cells(equipmentCol).Text, <ADD TO EXISTING LIST>
End If
Next
This seems a bit tedious to do. Is there a better approach to this? I tried searching for using arrays in vba and it seems a bit different than java, c++ and python, with stuft like redim preserve and the likes. Is this the only way to work with arrays in vba?
My solution:
Based on #varocarbas' comment I have created a dictionary of collections. This is the easiest way for my mind to comprehend what's going on, though it might not be the most efficient. The other solutions would probably work as well (not tested by me). This is my suggested solution and it provides the correct output:
'/--------------------------------------\'
'| Sets up the dictionary for equipment |'
'\--------------------------------------/'
inputRowMin = 1
inputRowMax = 173
inputColMin = 1
inputColMax = 2
equipmentCol = 1
dimensionCol = 2
Set equipmentDictionary = CreateObject("Scripting.Dictionary")
Set inputSheet = Application.Sheets(inputSheetName)
Set inputRange = Range(Cells(inputRowMin, inputColMin), Cells(inputRowMax, inputColMax))
Set equipmentCollection = New Collection
For i = 1 To inputRange.Height
thisEquipment = inputRange(i, equipmentCol).Text
nextEquipment = inputRange(i + 1, equipmentCol).Text
thisDimension = inputRange(i, dimensionCol).Text
'The Strings are equal - add thisEquipment to collection and continue
If (StrComp(thisEquipment, nextEquipment, vbTextCompare) = 0) Then
equipmentCollection.Add thisDimension
'The Strings are not equal - add thisEquipment to collection and the collection to the dictionary
Else
equipmentCollection.Add thisDimension
equipmentDictionary.Add thisEquipment, equipmentCollection
Set equipmentCollection = New Collection
End If
Next
'Check input
Dim tmpCollection As Collection
For Each key In equipmentDictionary.Keys
Debug.Print "--------------" & key & "---------------"
Set tmpCollection = equipmentDictionary.Item(key)
For i = 1 To tmpCollection.Count
Debug.Print tmpCollection.Item(i)
Next
Next
Note that this solution assumes that all the equipment are sorted!
Arrays in VBA are more or less like everywhere else with various peculiarities:
Redimensioning an array is possible (although not required).
Most of the array properties (e.g., Sheets array in a Workbook) are 1-based. Although, as rightly pointed out by #TimWilliams, the user-defined arrays are actually 0-based. The array below defines a string array with a length of 11 (10 indicates the upper position).
Other than that and the peculiarities regarding notations, you shouldn't find any problem to deal with VBA arrays.
Dim stringArray(10) As String
stringArray(1) = "first val"
stringArray(2) = "second val"
'etc.
Regarding what you are requesting, you can create a dictionary in VBA and include a list on it (or the VBA equivalent: Collection), here you have a sample code:
Set dict = CreateObject("Scripting.Dictionary")
Set coll = New Collection
coll.Add ("coll1")
coll.Add ("coll2")
coll.Add ("coll3")
If Not dict.Exists("dict1") Then
dict.Add "dict1", coll
End If
Dim curVal As String: curVal = dict("dict1")(3) '-> "coll3"
Set dict = Nothing
You can have dictionaries within dictionaries. No need to use arrays or collections unless you have a specific need to.
Sub FillNestedDictionairies()
Dim dcParent As Scripting.Dictionary
Dim dcChild As Scripting.Dictionary
Dim rCell As Range
Dim vaSplit As Variant
Dim vParentKey As Variant, vChildKey As Variant
Set dcParent = New Scripting.Dictionary
'Don't use currentregion if you have adjacent data
For Each rCell In Sheet2.Range("A1").CurrentRegion.Cells
'assume the text is separated by a space
vaSplit = Split(rCell.Value, Space(1))
'If it's already there, set the child to what's there
If dcParent.Exists(vaSplit(0)) Then
Set dcChild = dcParent.Item(vaSplit(0))
Else 'create a new child
Set dcChild = New Scripting.Dictionary
dcParent.Add vaSplit(0), dcChild
End If
'Assumes unique post-space data - text for Exists if that's not the case
dcChild.Add CStr(vaSplit(1)), vaSplit(1)
Next rCell
'Output to prove it works
For Each vParentKey In dcParent.Keys
For Each vChildKey In dcParent.Item(vParentKey).Keys
Debug.Print vParentKey, vChildKey
Next vChildKey
Next vParentKey
End Sub
I am not that familiar with C++ and Python (been a long time) so I can't really speak to the differences with VBA, but I can say that working with Arrays in VBA is not especially complicated.
In my own humble opinion, the best way to work with dynamic arrays in VBA is to Dimension it to a large number, and shrink it when you are done adding elements to it. Indeed, Redim Preserve, where you redimension the array while saving the values, has a HUGE performance cost. You should NEVER use Redim Preserve inside a loop, the execution would be painfully slow
Adapt the following piece of code, given as an example:
Sub CreateArrays()
Dim wS As Worksheet
Set wS = ActiveSheet
Dim Flanged_connections()
ReDim Flanged_connections(WorksheetFunction.CountIf(wS.Columns(1), _
"Flanged_connections"))
For i = 1 To wS.Cells(1, 1).CurrentRegion.Rows.Count Step 1
If UCase(wS.Cells(i, 1).Value) = "FLANGED_CONNECTIONS" Then ' UCASE = Capitalize everything
Flanged_connections(c1) = wS.Cells(i, 2).Value
End If
Next i
End Sub
If I have a loop that commences:
For each c in Range("A1:C8")
Is there a property of the placeholder c (c.count, c.value, c.something,...) that identifies the number of times the loop has iterated thus far? I would rather use something like this than including another variable.
Instead of using a "for each c in range" you can do something like this:
Dim c as Long 'presumably you did this as a Range, just change it to Long.
Dim myRange as Range 'Use a range variable which will be easier to call on later
Set myRange = Range("A1:C8")
For c = 1 to myRange.Cells.Count
'Do something to the cell, identify it as myRange.Cells(c), for example:
myRange.Cells(c).Font.Bold = True '<--- replace with your code that affects the cell
Next
This allows you to do the exact same For/Next loop, without including an unnecessary counter variable. In this case, c is a counter but also serves the purpose of identifying the cell being impacted by the code.
You need to count it yourself like this
Dim i as integer
i = 0
For each c in Range("A1:C8")
i = i + 1
Or
Dim i as integer
Dim c as Range
For i = 0 to Range("A1:C8").Count - 1
Set c = Range("A1:C8").Cells(i)
(Revised)
Using Column or Row properties, as appropriate to the direction you are iterating, you can compute an ordinal number on the fly. Thus
For Each c1 in myRange
myOrdinal = c1.row - myRange.row + 1 ' down contiguous cells in one column
myOrdinal = c1.Column - myRange.Column + 1 ' contiguous columns, L2R
Next
I have a VBA macro that uses Microsoft MapPoint to calculate the distance between two locations for each record in my spreadsheet. I have about 120,000 records to process. The program runs smoothly for about 10,000 iterations then returns a Type Mismatch error where I define the MapPoint locations in my error handler. At which point, I select 'Debug' and then resume execution without editing any code, and it will run successfully for another 10,000 or so records before the same thing happens again.
I've checked my data, and I can't see why there would be a type mismatch, or for that matter why the code would choke on a record one time, and then, without resetting anything, handle the same record upon resuming. Any idea why this would happen?
For reference,
- column M contains locations of the form "X County, ST"
- column AN contains a separate location as ZIP
- column G contains the same location data as AN but in the form "X County, ST"
Sub distance_from_res()
Dim oApp As MapPoint.Application
Dim k As Long
Dim count As Long
Dim errors As Long
k = 0
count = Sheets("i1_20041").Range("A2", Sheets("i1_20041").Range("A2").End(xlDown)).count
errors = 0
Set oApp = CreateObject("MapPoint.Application.NA.11")
oApp.Visible = False
Set objMap = oApp.NewMap
Dim objRes As MapPoint.Location
Dim objFish As MapPoint.Location
'Error executes code at 'LocError' and then returns to point of error.
On Error GoTo LocError
Do While k < count
If Sheets("i1_20041").Range("M2").Offset(k, 0) <> "" Then
'Sets MapPoint locations as [County],[State] from Excel sheet columns "INT_CNTY_ST" and "ZIP".
Set objRes = objMap.FindResults(Sheets("i1_20041").Range("AN2").Offset(k, 0)).Item(1)
Set objFish = objMap.FindResults(Sheets("i1_20041").Range("M2").Offset(k, 0)).Item(1)
'Calculates distance between two locations and prints it in appropriate cell in Column AO.
Sheets("i1_20041").Range("AO2").Offset(k, 0) = objRes.DistanceTo(objFish)
Else
errors = errors + 1
End If
k = k + 1
Loop
'Displays appropriate message at termination of program.
If errors = 0 Then
MsgBox ("All distance calculations were successful!")
Else
MsgBox ("Complete! Distance could not be calculated for " & errors & " of " & count & " records.")
End If
Exit Sub
LocError:
If Sheets("i1_20041").Range("G2").Offset(k, 0) = "" Then
errors = errors + 1
Else
'THIS IS WHERE THE ERROR OCCURS!
Set objRes = objMap.FindResults(Sheets("i1_20041").Range("G2").Offset(k, 0)).Item(1)
Set objFish = objMap.FindResults(Sheets("i1_20041").Range("M2").Offset(k, 0)).Item(1)
'Calculates distance between two locations and prints it in appropriate cell in Column AO.
Sheets("i1_20041").Range("AO2").Offset(k, 0) = objRes.DistanceTo(objFish)
End If
k = k + 1
Resume
End Sub
UPDATE:
I incorporated most of the suggestions from #winwaed and #Mike D, and my code is now more accurate and doesn't choke on errors. However, the old problem reared its head in a new form. Now, after around 10,000 iterations, the code continues but prints the distance of the ~10,000th record for every record afterwards. I can restart the code at the trouble point, and it will find the distances normally for those records. Why would this happen? I've posted my updated code below.
Sub distance_from_res()
Dim oApp As MapPoint.Application
Dim k As Long
Dim rc As Long
Dim errors As Long
Dim dist As Double
Dim zipRes As Range
Dim coRes As Range
Dim coInt As Range
Dim distR As Range
Set zipRes = Sheets("Sheet1").Range("C2")
Set coRes = Sheets("Sheet1").Range("B2")
Set coInt = Sheets("Sheet1").Range("E2")
Set distR = Sheets("Sheet1").Range("G2")
k = 0
rc = Sheets("Sheet1").Range("F2", Sheets("Sheet1").Range("F2").End(xlDown)).Count
errors = 0
'Start MapPoint application.
Set oApp = CreateObject("MapPoint.Application.NA.11")
oApp.Visible = False
Set objMap = oApp.NewMap
Dim objResultsRes As MapPoint.FindResults
Dim objResultsInt As MapPoint.FindResults
Dim objRes As MapPoint.Location
Dim objInt As MapPoint.Location
Do While k < rc
'Check results for Res Zip Code. If good, set first result to objRes. If not, check results for Res County,ST. If good, set first result to objRes. Else, set objRes to Nothing.
Set objResultsRes = objMap.FindResults(zipRes.Offset(k, 0))
If objResultsRes.ResultsQuality = geoFirstResultGood Then
Set objRes = objResultsRes.Item(1)
Else
Set objResultsRes = Nothing
Set objResultsRes = objMap.FindResults(coRes.Offset(k, 0))
If objResultsRes.ResultsQuality = geoFirstResultGood Then
Set objRes = objResultsRes.Item(1)
Else
If objResultsRes.ResultsQuality = geoAmbiguousResults Then
Set objRes = objResultsRes.Item(1)
Else
Set objRes = Nothing
End If
End If
End If
Set objResultsInt = objMap.FindResults(coInt.Offset(k, 0))
If objResultsInt.ResultsQuality = geoFirstResultGood Then
Set objInt = objResultsInt.Item(1)
Else
If objResultsInt.ResultsQuality = geoAmbiguousResults Then
Set objInt = objResultsInt.Item(1)
Else
Set objInt = Nothing
End If
End If
On Error GoTo ErrDist
distR.Offset(k, 0) = objRes.DistanceTo(objInt)
k = k + 1
Loop
Exit Sub
ErrDist:
errors = errors + 1
Resume Next
End Sub
You are constructing a somewhat complex range object (Range -> Offset -> Item). DIM temporary range objects and do it in steps so you can see where exactly the problem occurs
tmpR1 = Sheets("i1_20041").Range("G2")
tmpR2 = tmpR1.Offset(k,0)
then examine the .Count property of the .FindResult before you try accessing Item(1) .... maybe this item doesn't exist ?!?
Debug.Print objMap.FindResult(tmpR2).Count
Hint:
looking at your code, I observe that you use a variable "count". This variable name overlaps with the "Count" property in your second line of code - that's why the "Count" keyword at the end of the statement is printed all lowercase. It's not got anything to do with the errors (we pretend ;-) ), but bad style anyway.
MikeD is right with your dangerous FindResults() calls. However, there is a better way to check the results. The "FindResults collection" isn't a pure collection but includes an extra properties called "ResultsQuality". Docs are here:
http://msdn.microsoft.com/en-us/library/aa493061.aspx
Resultsquality returns a GeoFindResultsQuality enumeration. You want to check for the values geoAllResultsGood and geFirstResultGood. All other results should give an error of some result. Note that your existing code would work find with (for example) Ambiguous Results, even though it is unlikely the first result is the correct one. Also it might match on State or Zipcode (because that is the best it can find) whcih give you an erroneous result. Using ResultsQuality, you can detect this.
I would still check the value of Count as an additional check.
Note that your code is calculating straight line (Great Circle) distances. As such the bottleneck will be the geocoding (FindResults). If you are using the same locations a lot, then a caching mechanism could greatly speed things up.
If you want to calculate driving distances, then there are a number of products on the market for this (yes I wrote two of them!).