Do Collections need more computer power than arrays? - vba

Are Collections in vba not as efficient as arrays when it comes to long lists of strings?
My vba-Tool is not as fast as i want it to be. I use a lot of collections because i don't have to REDIM and also i don't have to use additionally counting-variables.
For example (I want to unite the array a and the collection col in one list, but the tricky part is, that to every array element, there are a certain number of col-elements):
For i = 1 To col.count
colSave.Add "==========================="
colSave.Add a(i - 1)
colSave.Add "==========================="
For k = 1 To colFilter.Item(i).count
colSave.Add col.Item(i).Item(k)
Next k
Next i
Is more efficient to use an array in this case with a third counting variable?

Probably the most efficient way is to list the strings in cells on a worksheet then read the range of those cells into an array. This is a very quick method (ranges of 100k cells read in milliseconds on a reasonably fast PC):
Sub test()
Dim a() As Variant
a = Range("A1:A1000").Value
End Sub
a will now contain those strings.
Note that this method produces a multidimensional base 1 array, not base zero, so for example the first string in the above example would be at index 1,1.

Related

VBA: Big Data and the use of Arrays

So I'm working with large 28,000 line plus data+. Plus possibly 5 other spreadsheets to cross reference against.
I keep being told Arrays are faster but can it be explained to me it seems they are faster where you can read and write large chunks of data into the array at a time. Which is something where I can understand where there might be a speed overhead reduction.
Or is it right to say Arrays are just plain faster than say....
Worksheet.range("A1").Value=AOtherWorksheet.range("A1").Value
It just appears somewhat magical if that's the case as could get why reading in blocks of variants would be faster but don't necessarily get why reading off a sheet into a array and then off array into second sheet would be faster. Have I misunderstood I'm just trying to tease that specific part out.
Any other tricks comments for automating large spreadsheets welcome but was mainly focused on understanding this titbit.
I think the magic is caused by complexity - each cell carries with it a lot of "baggage"
Hundreds of settings for its environment, and most of them are about cell formatting
Height, Width, RowHeight, ColumnWidth, Column, Row, Font, IndentLevel, etc
To see all properties, observe the watch window for Sheet1.Range("A1")
(properties with a + next to them are complex objects with their own set of properties)
The main reason for optimizing with arrays is to avoid all formatting
Each cell "knows" about all settings regardless if they are changed or not, and carries all this "weight" around. Most users, most of the times only care about the value in the cell, and never touch the formatting. In rare occasions you may be stuck working directly with the range object if you need to modify each individual cell's .Borders, .Interior.Color, .Font, etc, and even then, there are ways of grouping similarly formatted cells and modifying the attributes of the entire group at once
.
To continue with the baggage analogy (and this is stretching it a bit): at an airport, if I need to refill a pen for passenger "John Doe" from his luggage already on the plane, in a utility room at the back of the airport, I will be able to do it (I have all the info I need), but it'll take me time to go back and forth, carrying that luggage. And for one passenger it can be done in a reasonable amount of time, but how much longer would it take to refill 20K pens, or 100K, or a million ? (ONE - BY - ONE)
I view the Range <-> VBA interaction the same way: working with individual cells one at the time, is like carrying each individual luggage for a million passengers, to the utility room at the back of the airport. This is what this statement does:
Sheet1.Range("A1:A1048576").Value = Sheet2.Range("A1:A1048576").Value
as opposed to extracting all pens, from all suitcases at once, refilling them, and placing them all back
.
Copying the range object to an array is isolating one of the properties for each cell - its Value ("the pen"), from all the others settings (Excel is extremely efficient about this). We now have an array of only the values, and no other formatting settings. Modify each value individually in memory, then place them all back into the range object:
Dim arr as Variant
arr = Sheet2.Range("A1:A1048576") 'Get all values from Sheet2 into Sheet1
Sheet1.Range("A1:A1048576") = arr
.
This is where the Copy / Paste parameters are different as well:
Sheet2.Range("A1:A1048576").Copy
Sheet1.Range("A1:A1048576").PasteSpecial xlPasteAll
.
Timers for Rows: 1,048,573
xlPasteAll - Time: 0.629 sec; (all values + all formatting)
xlPasteAllExceptBorders - Time: 0.791 sec
xlPasteAllMergingConditionalFormats - Time: 0.782 sec; (no merged cells)
xlPasteAllUsingSourceTheme - Time: 0.791 sec
xlPasteColumnWidths - Time: 0.004 sec
xlPasteComments - Time: 0.000 sec; (comments test is too slow)
xlPasteFormats - Time: 0.497 sec; (format only, no values, no brdrs)
xlPasteFormulas - Time: 0.718 sec
xlPasteFormulasAndNumberFormats - Time: 0.775 sec
xlPasteValidation - Time: 0.000 sec
xlPasteValues - Time: 0.770 sec; (conversion from formula to val)
xlPasteValuesAndNumberFormats - Time: 0.634 sec
.
Another aspect, beyond arrays, are the types of indexes for data structures
For most situations arrays are acceptable, but when there is a need for better performance, there are the Dictionary and Collection objects
One array inefficiency is that for finding elements we need to iterate over each one
A more convenient option could be to access specific items, a lot faster
Dim d As Object 'Not indexed (similar to linked lists)
Set d = CreateObject("Scripting.Dictionary") 'Native to VB Script, not VBA
d.Add Key:="John Doe", Item:="31" 'John Doe - 31 (age); index is based on Key
d.Add Key:="Jane Doe", Item:="33"
Debug.Print d("Jane Doe") 'Prints 33
Dictionaries also have the very useful and fast method of checking items d.Exists("John Doe"), which returns True or False without errors (but Collections don't). With an array you'd have to loop over potentially all items to find out
I think one of the fastest ways to extract unique values for large columns is to combine arrays and dictionaries
Public Sub ShowUniques()
With Sheet1.UsedRange
GetUniques .Columns("A"), .Columns("B")
End With
End Sub
Public Sub GetUniques(ByRef dupesCol As Range, uniquesCol As Range)
Dim arr As Variant, d As Dictionary, i As Long, itm As Variant
arr = dupesCol
Set d = CreateObject("Scripting.Dictionary")
For i = 1 To UBound(arr)
d(arr(i, 1)) = 0 'Shortcut to add new items to dictionary, ignoring dupes
Next
uniquesCol.Resize(d.Count) = Application.Transpose(d.Keys)
'Or - Place d itms in new array (resized accordingly), and place array back on Range
' ReDim arr(1 To d.Count, 1 To 1)
' i = 1
' For Each itm In d
' arr(i, 1) = itm
' i = i + 1
' Next
' uniquesCol.Resize(d.Count) = arr
End Sub
From: Col A To: Col B
1 1
2 2
1 3
3
Dictionaries don't accept duplicate keys, just ignores them

Outputting rows into another sheet

I have two sets of data stored in two different sheets. I need to run an analysis which prints out the non-duplicate rows (i.e. row is present in one and not the other) found in the sheets and print them in a new sheet.
I can do the comparison fine - it is relatively simple with ranges and the For Next method. I currently store the non-duplicates in two different collections, each representing the non-duplicates in each sheet. However I am having trouble deciding how to proceed with pasting the duplicate rows on the new sheet.
I thought about storing the entire row into a collection but printing the row out of the collection in the new sheet seems non-trivial: I would have to determine the size of the collection, set the appropriate range and then iterate through the collection and print them out. I would also like to truncate this data which would add another layer of complexity.
The other method I thought was simply storing the row number and using Range.Select.Copy and PasteSpecial. The advantage of this is that I can truncate however much I wish, however this seems incredibly hacky to me (essentially using VBA to simulate user input) and I am not sure on performance hits.
What are the relative merits or is there a better way?
I have been tackling a similar problem at work this week. I have come up with two methods:
First you could simply iterate through each collection one row at a time, and copy the values to the new sheet:
Function PasteRows1(ByRef srcRows As Collection, ByRef dst As Worksheet)
Dim row As Range
Dim curRow As Integer
curRow = 1
For Each row In srcRows
dst.rows(curRow).Value = row.Value
curRow = curRow + 1
Next
End Function
This has the benefit of not using the Range.Copy method and so the user's clipboard is preserved. If you are not copying an entire row then you will have to create a range that starts at the first cell of the row and then resize it using Range.Resize. So the code inside the for loop would roughly be:
Dim firstCellInRow as Range
Set firstCellInRow = dst.Cells(curRow,1)
firstCellInRow.Resize(1,Row.columns.Count).Value = row.Value
curRow = curRow + 1
The second method I thought of uses the Range.Copy. Like so:
Function PasteRows2(ByRef srcRows As Collection, ByRef dst As Worksheet)
Dim row As Range
Dim disjointRange As Range
For Each row In srcRows
If disjointRange is Nothing Then
Set disjointRange = row
Else
Set disjointRange = Union(disjointRange, row)
End If
Next
disjointRange.Copy
dst.Paste
End Function
While this does use the .Copy method it also will allow you to copy all of the rows in one shot which is nice because you will avoid partial copies if excel ever crashes in the middle of your macro.
Let me know if either of these methods satisfy your needs :)

Write String() array to excel excluding blank values

Is there any functionality that would allow me to write a two dimensional string array to an Excel worksheet, but only "overwrite" the cells in Excel if the corresponding value isn't an empty string?
I have an application that does this...it pulls data from our database, puts it in a two dimensional array, and writes the whole array to an excel template. The columns written to are A through S, but in column Q there is a formula which is currently being overwritten, as it isn't pulled from the database.
I'd hate to hardcode the formula in the program, so I'm hopeful that there is a way to accomplish what I'm trying to do. The only idea I've had is to attempt to read the range from the worksheet first, but I'm curious to know if there is some kind of functionality that will handle this already.
EDIT: To be clear, this is a VB.Net program writing to Excel.
edit:written on the assumption this was VBA?
You could do it like this, which uses INDEX to extact the columns of the array after column Q.
Sub Sample()
Dim X
'create 2D array matching your size
X = Range("A1:S10").Value2
Range("A1:P10").Value2 = X
'add 2nd last column to R
Range("R1:R10").Value2 = Application.Index(X, 0, UBound(X, 2) - 1)
'add last column to S
Range("S1:S10").Value2 = Application.Index(X, 0, UBound(X, 2))
End Sub

How to paste part of two dimensional array into worksheet range without using loop?

How to paste part of two dimensional array into worksheet range without using loop ?
Pasting whole array is straightforward :
Sub TestPasteArrayIntoRange()
Dim MyArray As Variant
MyArray = [{1,2;3,4;5,101}]
selection.Resize(UBound(MyArray, 1), UBound(MyArray, 2)) = MyArray
End Sub
But what about pasting only second row (first row we could past in by simple reducing selection size to one row) ? I've seen answer for case of one dimensional array which use Index Excel function, but it doesn't apply here.
My first idea was to copy array into range and then copy row from it, but it seems, that I cannot create range object which is not part of some worksheet object (expecially cannot create virtual range without virtual worksheet).
I think you will just have to create another function like e.g. GetRow() or GetColumn() to return a reduced vector (or array) with the select values you need. There are some functions here that you can use for your purpose: http://www.cpearson.com/excel/vbaarrays.htm

Pass a range into a custom function from within a cell

Hi I'm using VBA in Excel and need to pass in the values from two ranges into a custom function from within a cell's formula. The function looks like this:
Public Function multByElement(range1 As String, range2 As String) As Variant
Dim arr1() As Variant, arr2() As Variant
arr1 = Range(range1).value
arr2 = Range(range2).value
If UBound(arr1) = UBound(arr2) Then
Dim arrayA() As Variant
ReDim arrayA(LBound(arr1) To UBound(arr1))
For i = LBound(arr1) To UBound(arr1)
arrayA(i) = arr1(i) * arr2(i)
Next i
multByElement = arrayA
End If
End Function
As you can see, I'm trying to pass the string representation of the ranges. In the debugger I can see that they are properly passed in and the first visible problem occurs when it tries to read arr1(i) and shows as "subscript out of range". I have also tried passing in the range itself (ie range1 as Range...) but with no success.
My best suspicion was that it has to do with the Active Sheet since it was called from a different sheet from the one with the formula (the sheet name is part of the string) but that was dispelled since I tried it both from within the same sheet and by specifying the sheet in the code.
BTW, the formula in the cell looks like this:
=AVERAGE(multByElement("A1:A3","B1:B3"))
or
=AVERAGE(multByElement("My Sheet1!A1:A3","My Sheet1!B1:B3"))
for when I call it from a different sheet.
First, see the comment Remou left, since that's really what you should be doing here. You shouldn't need VBA at all to get an element-wise multiplication of two arrays.
Second, if you want to work with Ranges, you can do that by declaring your function arguments to be of type Range. So you could have
Public Function multByElement(range1 As Range, range2 As Range)
and not need to resolve strings to range references yourself. Using strings prevents Excel from updating references as things get moved around in your worksheet.
Finally, the reason why your function fails the way it does is because the array you get from taking the 'Value' of a multi-cell Range is two-dimensional, and you'd need to acces its elements with two indices. Since it looks like you're intending to (element-wise) multiply two vectors, you would do either
arrayA(i) = arr1(i,1) * arr2(i,1)
or
arrayA(i) = arr1(1,i) * arr2(1,i)
depending on what orientation you expected from your input. (Note that if you do this with VBA, orientation of what is conceptually a 1-D array matters, but if you follow Remou's advice above, Excel will do the right thing regardless of whether you pass in rows or columns, or range references or array literals.)
As an epilogue, it also looks like you're not using 'Option Explicit'. Google around for some rants on why you probably always want to do this.