Repeatedly checking entire scripting dictionary via loop bad practice? VB6 - vba

I have a loop that checks for values and adds them to a dictionary if they do not exist. This check happens inside of a loop so it happens repeatedly. As the dictionary grows I imagine there is an inner loop going on with each check that is becoming costly. In an effort to speed up my entire routine I am wondering if this might be something I should optimize? If so, how?
keycnt = 1
For x = 1 to 500000
STRVAL = returnarray(8, x)
If Not STRDIC.Exists(STRVAL) Then
STRDIC.Add STRVAL, keycnt
keycnt = keycnt + 1
End If
next x

If your dictionary is a Scripting.Dictionary and not some [poorly implemented] custom-made data structure class, then a key lookup should be O(1) complexity, not O(n) as you seem to imply; the growing number of keys has no impact on performance.
Key lookup in a hash table or dictionary is basically free, the only thing to fix in that code is the indentation.

Related

Is this an incorrect way of iterating over a dictionary?

Are there any problems with iterating over a dictionary in the following manner?
Dim dict As New Dictionary(Of String, Integer) From {{"One", 1}, {"Two", 2}, {"Three", 3}}
For i = 0 To dict.Count - 1
Dim Key = dict.Keys(i)
Dim Value = dict.Item(Key)
'Do more work
dict.Item(Key) = NewValue
Next
I have used it a lot without any problems. But I recently read that the best way to iterate over a dictionary was using a ForEach loop. This led me to question the method that I've used.
Update: Note I am not asking how to iterate over a dictionary, but rather if the method that I've used successfully in the past is wrong and if so why.
Are there any problems with iterating over a dictionary in the following manner?
Yes and no. Technically there's nothing inherently wrong with the way you're doing it as it does what you need it to do, BUT it requires unnecessary computations and is therefore slower than simply using a For Each loop and iterating the key/value-pairs.
Iterating keys, then fetching value
The Keys property is not a separate collection of keys, but is actually just a thin wrapper around the dictionary itself which contains an enumerator for enumerating the keys only. For this reason it also doesn't have an indexer that lets you access the key at a specific index like you are right now.
What's actually happening is that VB.NET is utilizing the extension method ElementAtOrDefault(), which works by stepping through the enumeration until the wanted index has been reached. This means that for every iteration of your main loop, ElementAtOrDefault() also performs a similar step-through iteration until it gets to the index you've specified. You now have two loops, resulting in an O(N * N) = O(N2) operation.
What's more, when you access the value via Item(Key) it has to calculate the hash of the key and determine the respective value to fetch. While this operation is close to O(1), it's still an unnecessary additional operation compared to what I'm talking about below.
Iterating key/value-pairs
The dictionary already has an internal list (array) holding the keys and their respective values, so when iterating the dictionary using a For Each loop all it does is fetch each pair and put them into a KeyValuePair. Since it is fetching directly by index this time (at a specific memory location) you only have one loop, thus the fetch operation is O(1), making your entire loop O(N * 1) = O(N).
Based on this we see that iterating the key/value-pairs is actually faster.
This kind of loop would look like (where kvp is a KeyValuePair(Of String, Integer)):
For Each kvp In dict
Dim Key = kvp.Key
Dim Value = kvp.Value
Next
See here:
https://www.dotnetperls.com/dictionary-vbnet
Keys. You can get a List of the Dictionary keys. Dictionary has a get accessor property with the identifier Keys. You can pass the Keys to a List constructor to obtain a List of the keys.
It cites an example similar to yours:
Module Module1
Sub Main()
' Put four keys and values in the Dictionary.
Dim dictionary As New Dictionary(Of String, Integer)
dictionary.Add("please", 12)
dictionary.Add("help", 11)
dictionary.Add("poor", 10)
dictionary.Add("people", -11)
' Put keys into List Of String.
Dim list As New List(Of String)(dictionary.Keys)
' Loop over each string.
Dim str As String
For Each str In list
' Print string and also Item(string), which is the value.
Console.WriteLine("{0}, {1}", str, dictionary.Item(str))
Next
End Sub
End Module

Unexpected OutOfMemoryException in ILNumerics

The following VB .net code gives me an out of memory exception. Does anybody knows why?
Dim vArray As ILArray(Of Double) = ILMath.rand(10000000)
Using ILScope.Enter(vArray)
For i As Integer = 1 To 100
vArray = ILMath.add(vArray, vArray)
Next
End Using
Thank you very much.
In this toy example you can simply remove the artificial scope and it will run fine:
Dim vArray As ILArray(Of Double) = ILMath.rand(10000000)
For i As Integer = 1 To 100
vArray = ILMath.add(vArray, vArray)
Next
Console.WriteLine("OK: " + vArray(0).ToString())
Console.ReadKey()
However, in a more serious situation, ILScope will be your friend. As stated on the ILNumerics page an artificial scope ensures a deterministic memory management:
All arrays created inside the scope are disposed once the block was
left.
Otherwise one had to rely on the GC for cleanup. And, as you know, this involves a gen 2 collection for large objects – with all disadvantages in terms of performance.
In order to be able to dispose the arrays they need to be collected and tracked somehow. Whether or not this qualifies for the term 'memory leak' is rather a philosophical question. I will not go into it here. The deal is: after the instruction pointer runs out of the scope these arrays are taken care of: their memory is put into the memory pool and will be reused. As a consequence, no GC will be triggered.
The scheme is especially useful for long running operations and for large data. Currently, the arrays are released only AFTER the scope block was left. So if you create an algorithm/ loop which requires more memory than available on your machine you need to clean up DURING the loop already:
Dim vArray As ILArray(Of Double) = ILMath.rand(10000000)
For i As Integer = 1 To 100
Using ILScope.Enter
vArray.a = ILMath.add(vArray, vArray)
' ...
End Using
Next
Here, the scope cleans up the memory after each iteration of the loop. This affects all local arrays assigned within the loop body. If we want an array value to survive the loop iteration we can assign to its .a property as shown with vArray.a.

Is Try/Catch ever LESS expensive than a hash lookup?

I'm aware that exception trapping can be expensive, but I'm wondering if there are cases when it's actually less expensive than a lookup?
For example, if I have a large dictionary, I could either test for the existence of a key:
If MyDictionary.ContainsKey(MyKey) Then _
MyValue = MyDictionary(MyKey) ' This is 2 lookups just to get the value.
Or, I could catch an exception:
Try
MyValue = MyDictionary(MyKey) ' Only doing 1 lookup now.
Catch(e As Exception)
' Didn't find it.
End Try
Is exception trapping always more expensive than lookups like the above, or is it less so in some circumstances?
The thing about dictionary lookups is that they happen in constant or near-constant time. It takes your computer about the same amount of time whether your dictionary holds one item or one million items. I bring this up because you're worried about making two lookups in a large dictionary, and reality is that it's not much different from making two lookups in a small dictionary. As a side note, one of the implications here is that dictionaries are not always the best choice for small collections, though I normally find the extra clarity still outweighs any performance issues for those small collections.
One of the things that determines just how fast a dictionary can make it's lookups is how long it takes to generate a hash value for a particular object. Some objects can do this much faster than others. That means the answer here depends on the kind of object in your dictionary. Therefore, the only way to know for sure is to build a version that tests each method a few hundred thousand times to find out which completes the set faster.
Another factor to keep in mind here is that it's mainly just the Catch block that is slow with exception handling, and so you'll want to look for the right combination of lookup hits and misses that reasonably matches what you'd expect in production. For this reason, you can't find a general guideline here, or if you do it's likely to be wrong. If you only rarely have a miss, then I would expect the exception handler to do much better (and, by virtue of the a miss being somewhat, well, exceptional, it would also be the right solution). If you miss more often, I might prefer a different approach
And while we're at it, let's not forget about Dictionary.TryGetValue()
I tested performance of ContainsKey vs TryCatch, here are the results:
With debugger attached:
Without debugger attached:
Tested on Release build of a Console application with just the Sub Main and below code. ContainsKey is ~37000 times faster with debugger and still 355 times faster without debugger attached, so even if you do two lookups, it would not be as bad as if you needed to catch an extra exception. This is assuming you are looking for missing keys quite often.
Dim dict As New Dictionary(Of String, Integer)
With dict
.Add("One", 1)
.Add("Two", 2)
.Add("Three", 3)
.Add("Four", 4)
.Add("Five", 5)
.Add("Six", 6)
.Add("Seven", 7)
.Add("Eight", 8)
.Add("Nine", 9)
.Add("Ten", 10)
End With
Dim stw As New Stopwatch
Dim iterationCount As Long = 0
Do
stw.Start()
If Not dict.ContainsKey("non-existing key") Then 'always true
stw.Stop()
iterationCount += 1
End If
If stw.ElapsedMilliseconds > 5000 Then Exit Do
Loop
Dim stw2 As New Stopwatch
Dim iterationCount2 As Long = 0
Do
Try
stw2.Start()
Dim value As Integer = dict("non-existing key") 'always throws exception
Catch ex As Exception
stw2.Stop()
iterationCount2 += 1
End Try
If stw2.ElapsedMilliseconds > 5000 Then Exit Do
Loop
MsgBox("ContainsKey: " & iterationCount / 5 & " per second, TryCatch: " & iterationCount2 / 5 & " per second.")
If you are trying to find an item in a data structure of some kind which is not easily searched (e.g. finding an item containing the word "flabbergasted" in an unindexed string array of 100K items, then yes, letting it throw the exception would be faster because you'd only be doing the look-up once. If you check if the item exists first, then get the item, you are doing the look-up twice. However, in your example, where you are looking up an item in a dictionary (hash table), it should be very quick, so doing the lookup twice would likely be faster than letting it fail, but it's hard to say without testing it. It all depends how quickly the hash value for the object can be calculated and how many items in the list share the same hash value.
As others have suggested, in the case of the Dictionary, the TryGetValue would provide the best of both methods. Other list types offer similar functionality.

For Each loop enumerator expression and memory consumption

According to the language specification guide for VB.NET Section 10.9.3
The enumerator expression in a for each loop is copied over into
memory.
If I have a list of 10000 objects that list will be in memory twice for the code below?
dim myList as new list(of bobs)
'put 10000 bobs in my list
for each x In myList
'do something
next
If I were generating the list from a linqQuery or some other such query it would make sense to generate that list at the for each loop statement thus not having the list in memory twice for example.
for each x in myList.where(function(x) x.name = Y)
'do something
next
If the LINQ query is unreadable on the for each loop, do I forgo readability and just put it on the for each loop declaration line?
Should I declare the list in its own variable and just bite the bullet and have the list exist twice in memory?
that list will be in memory twice for the code below
No, it won't. In your case, the spec is talking about the variable "x" here - not the entire collection. Remember, and enumerator (any IEnumerable<T> or similar) doesn't necessarily even have items in memory. When created via an iterator in C#, for example, you can have "collections" that are generated as you enumerate over them. There isn't a "list" of objects (necessarily) that could be copied, even if the language wanted to do so.
Is the linq query is unreadable on the for each loop
In many cases, I prefer filtering this way. You can just as easily move this outside of the loop, if you want to make it more clear, as well:
Dim filteredCollection = myList.Where(Function(x) x.name = Y)
For Each x in filteredCollection
There is no disadvantage to doing this if you find it more readable.

Comparing an item in a list against other items in the same list in VB.NET

Simplified, I have a List(Of MyObj), and I want to iterate through that list and compare each element to all other elements in the same list, excluding (if possible) the same element. I have a solution that works, but it's slow and uses double For loops. It may possibly have also summoned Cthulhu from his sleep.
Is there a better approach? Linq, perhaps? Or some fancy algorithm? This below is a sanitized version of what I have:
Dim MyList As New List(Of MyObj)({Obj1, Obj2, Obj3, Obj4, Obj5, Obj6})
If MyList.Count > 0 Then
For i = 0 To (MyList.Count - 1) Step 1
For j As Int32 = 0 To (MyList.Count - 1) Step 1
If MyList(i).GetHashCode = MyList(j).GetHashCode Then
Continue For
Else
If MyList(i).SomeFunction(MyList(j)) Then
Call DoSomething()
End If
End If
Next j
Next i
Else
' Error Code Here.
End If
This will work in O(M*N) where N is ObjCount and M is the number of non-duplicate objects. Your current solution is O(N^2).
You need a Hash Function. You can determine whether GetHashCode will suffice or whether you need to implement Sha1.
Instantiate a HashSet (or HashTable, depending on your Hash Function)
Add each object, if it does not already exist, into the HashSet or HashTable.
For each object in the HashSet, execute SomeFunction() against every other object in the HashSet. If you dump into an array and iterate via indexes, you only have to compare indexes, rather than objects.
For i as integer = 0 to MyHashResultsArray.Count - 1
For j as integer = 0 to MyHashResultsArray.Count - 1
if i <> j then
MyHashResultsArray(i).DoSomething(j)
end if
next
next
Important
This is only a good solution IF there exists a significant amount of duplicates, perhaps a duplicate-level of 10% would be necessary to consider this solution, except for very large values of N. If N gets too large, a re-engineering of the application may be necessary to hopefully avoid the need for M actions against M objects.
Edit
Much of the comment discussion was based upon my misunderstanding of the Author's needs regarding the DoSomething() function.
Barring any potential problems with using GetHashCode to check for object equality (best not to do this - it'll only bite you at some point - and it's probably this that has awakened Cthulhu!), your solution is about as fast as it's likely to get.
Sure, you can tweak it, but it will remain O(N^2), that is, the runtime will be of the order of the square of the number of elements in your list. If you double the number of elements, your runtime will increase by a factor of 4.
See if this will work
MyList.Select(Function(x) MyList.Except(New () {x}).ToList().ForEach(Sub(y) Do
If x.SomeFunction(y) Then
DoSomething()
End If
End Sub))