Is Try/Catch ever LESS expensive than a hash lookup? - vb.net

I'm aware that exception trapping can be expensive, but I'm wondering if there are cases when it's actually less expensive than a lookup?
For example, if I have a large dictionary, I could either test for the existence of a key:
If MyDictionary.ContainsKey(MyKey) Then _
MyValue = MyDictionary(MyKey) ' This is 2 lookups just to get the value.
Or, I could catch an exception:
Try
MyValue = MyDictionary(MyKey) ' Only doing 1 lookup now.
Catch(e As Exception)
' Didn't find it.
End Try
Is exception trapping always more expensive than lookups like the above, or is it less so in some circumstances?

The thing about dictionary lookups is that they happen in constant or near-constant time. It takes your computer about the same amount of time whether your dictionary holds one item or one million items. I bring this up because you're worried about making two lookups in a large dictionary, and reality is that it's not much different from making two lookups in a small dictionary. As a side note, one of the implications here is that dictionaries are not always the best choice for small collections, though I normally find the extra clarity still outweighs any performance issues for those small collections.
One of the things that determines just how fast a dictionary can make it's lookups is how long it takes to generate a hash value for a particular object. Some objects can do this much faster than others. That means the answer here depends on the kind of object in your dictionary. Therefore, the only way to know for sure is to build a version that tests each method a few hundred thousand times to find out which completes the set faster.
Another factor to keep in mind here is that it's mainly just the Catch block that is slow with exception handling, and so you'll want to look for the right combination of lookup hits and misses that reasonably matches what you'd expect in production. For this reason, you can't find a general guideline here, or if you do it's likely to be wrong. If you only rarely have a miss, then I would expect the exception handler to do much better (and, by virtue of the a miss being somewhat, well, exceptional, it would also be the right solution). If you miss more often, I might prefer a different approach
And while we're at it, let's not forget about Dictionary.TryGetValue()

I tested performance of ContainsKey vs TryCatch, here are the results:
With debugger attached:
Without debugger attached:
Tested on Release build of a Console application with just the Sub Main and below code. ContainsKey is ~37000 times faster with debugger and still 355 times faster without debugger attached, so even if you do two lookups, it would not be as bad as if you needed to catch an extra exception. This is assuming you are looking for missing keys quite often.
Dim dict As New Dictionary(Of String, Integer)
With dict
.Add("One", 1)
.Add("Two", 2)
.Add("Three", 3)
.Add("Four", 4)
.Add("Five", 5)
.Add("Six", 6)
.Add("Seven", 7)
.Add("Eight", 8)
.Add("Nine", 9)
.Add("Ten", 10)
End With
Dim stw As New Stopwatch
Dim iterationCount As Long = 0
Do
stw.Start()
If Not dict.ContainsKey("non-existing key") Then 'always true
stw.Stop()
iterationCount += 1
End If
If stw.ElapsedMilliseconds > 5000 Then Exit Do
Loop
Dim stw2 As New Stopwatch
Dim iterationCount2 As Long = 0
Do
Try
stw2.Start()
Dim value As Integer = dict("non-existing key") 'always throws exception
Catch ex As Exception
stw2.Stop()
iterationCount2 += 1
End Try
If stw2.ElapsedMilliseconds > 5000 Then Exit Do
Loop
MsgBox("ContainsKey: " & iterationCount / 5 & " per second, TryCatch: " & iterationCount2 / 5 & " per second.")

If you are trying to find an item in a data structure of some kind which is not easily searched (e.g. finding an item containing the word "flabbergasted" in an unindexed string array of 100K items, then yes, letting it throw the exception would be faster because you'd only be doing the look-up once. If you check if the item exists first, then get the item, you are doing the look-up twice. However, in your example, where you are looking up an item in a dictionary (hash table), it should be very quick, so doing the lookup twice would likely be faster than letting it fail, but it's hard to say without testing it. It all depends how quickly the hash value for the object can be calculated and how many items in the list share the same hash value.
As others have suggested, in the case of the Dictionary, the TryGetValue would provide the best of both methods. Other list types offer similar functionality.

Related

Match Words and Add Quantities vb.net

I am trying to program a way to read a text file and match all the values and their quantites. For example if the text file is like this:
Bread-10 Flour-2 Orange-2 Bread-3
I want to create a list with the total quantity of all the common words. I began my code, but I am having trouble understanding to to sum the values. I'm not asking for anyone to write the code for me but I am having trouble finding resources. I have the following code:
Dim query = From data In IO.File.ReadAllLines("C:\User\Desktop\doc.txt")
Let name As String = data.Split("-")(0)
Let quantity As Integer = CInt(data.Split("-")(1))
Let sum As Integer = 0
For i As Integer = 0 To query.Count - 1
For j As Integer = i To
Next
Thanks
Ok, lets break this down. And I not seen the LET command used for a long time (back in the GWBASIC days!).
But, that's ok.
So, first up, we going to assume your text file is like this:
Bread-10
Flour-2
Orange-2
Bread-3
As opposed to this:
Bread-10 Flour-2 Orange-2 Bread-3
Now, we could read one line, and then process the information. Or we can read all lines of text, and THEN process the data. If the file is not huge (say a few 100 lines), then performance is not much of a issue, so lets just read in the whole file in one shot (and your code also had this idea).
Your start code is good. So, lets keep it (well ok, very close).
A few things:
We don't need the LET for assignment. While older BASIC languages had this, and vb.net still supports this? We don't need it. (but you will see examples of that still floating around in vb.net - especially for what we call "class" module code, or "custom classes". But again lets just leave that for another day.
Now the next part? We could start building up a array, look for the existing value, and then add it. However, this would require a few extra arrays, and a few extra loops.
However, in .net land, we have a cool thing called a dictionary.
And that's just a fancy term of for a collection VERY much like an array, but it has some extra "fancy" features. The fancy feature is that it allows one to put into the handly list things by a "key" name, and then pull that "value" out by the key.
This saves us a good number of extra looping type of code.
And it also means we don't need a array for the results.
This key system is ALSO very fast (behind the scene it uses some cool concepts - hash coding).
So, our code to do this would look like this:
Note I could have saved a few lines here or there - but that would make this code hard to read.
Given that you look to have Fortran, or older BASIC language experience, then lets try to keep the code style somewhat similar. it is stunning that vb.net seems to consume even 40 year old GWBASIC type of syntax here.
Do note that arrays() in vb.net do have some fancy "find" options, but the dictionary structure is even nicer. It also means we can often traverse the results with out say needing a for i = 1 to end of array, and having to pull out values that way.
We can use for each.
So this would work:
Dim MyData() As String ' an array() of strings - one line per array
MyData = File.ReadAllLines("c:\test5\doc.txt") ' read each line to array()
Dim colSums As New Dictionary(Of String, Integer) ' to hold our values and sum them
Dim sKey As String
Dim sValue As Integer
For Each strLine As String In MyData
sKey = Split(strLine, "-")(0)
sValue = Split(strLine, "-")(1)
If colSums.ContainsKey(sKey) Then
colSums(sKey) = colSums(sKey) + sValue
Else
colSums.Add(sKey, sValue)
End If
Next
' display results
Dim KeyPair As KeyValuePair(Of String, Integer)
For Each KeyPair In colSums
Debug.Print(KeyPair.Key & " = " & KeyPair.Value)
Next
The above results in this output in the debug window:
Bread = 13
Flour = 2
Orange = 2
I was tempted here to write this code using just pure array() in vb.net, as that would give you a good idea of the "older" types of coding and syntax we could use here, and a approach that harks all the way back to those older PC basic systems.
While the dictionary feature is more advanced, it is worth the learning curve here, and it makes this problem a lot easier. I mean, if this was for a longer list? Then I would start to consider introduction of some kind of data base system.
However, without some data system, then the dictionary feature is a welcome approach due to that "key" value lookup ability, and not having to loop. It also a very high speed system, so the result is not much looping code, and better yet we write less code.

Mid() usage and for loops - Is this good practice?

Ok so I was in college and I was talking to my teacher and he said my code isn't good practice. I'm a bit confused as to why so here's the situation. We basically created a for loop however he declared his for loop counter outside of the loop because it's considered good practice (to him) even though we never used the variable later on in the code so to me it looks like a waste of memory. We did more to the code then just use a message box but the idea was to get each character from a string and do something with it. He also used the Mid() function to retrieve the character in the string while I called the variable with the index. Here's an example of how he would write his code:
Dim i As Integer = 0
Dim justastring As String = "test"
For i = 1 To justastring.Length Then
MsgBox( Mid( justastring, i, 1 ) )
End For
And here's an example of how I would write my code:
Dim justastring As String = "test"
For i = 0 To justastring.Length - 1 Then
MsgBox( justastring(i) )
End For
Would anyone be able to provide the advantages and disadvantages of each method and why and whether or not I should continue how I am?
Another approach would be, to just use a For Each on the string.
Like this no index variable is needed.
Dim justastring As String = "test"
For Each c As Char In justastring
MsgBox(c)
Next
I would suggest doing it your way, because you could have variables hanging around consuming(albeit a small amount) of memory, but more importantly, It is better practice to define objects with as little scope as possible. In your teacher's code, the variable i is still accessible when the loop is finished. There are occasions when this is desirable, but normally, if you're only using a variable in a limited amount of code, then you should only declare it within the smallest block that it is needed.
As for your question about the Mid function, individual characters as you know can be access simply by treating the string as an array of characters. After some basic benchmarking, using the Mid function takes a lot longer to process than just accessing the character by the index value. In relatively simple bits of code, this doesn't make much difference, but if you're doing it millions of times in a loop, it makes a huge difference.
There are other factors to consider. Such as code readability and modification of the code, but there are plenty of websites dealing with that sort of thing.
Finally I would suggest changing some compiler options in your visual studio
Option Strict to On
Option Infer to Off
Option Explicit to On
It means writing more code, but the code is safer and you'll make less mistakes. Have a look here for an explanation
In your code, it would mean that you have to write
Dim justastring As String = "test"
For i As Integer = 0 To justastring.Length - 1 Then
MsgBox( justastring(i) )
End For
This way, you know that i is definitely an integer. Consider the following ..
Dim i
Have you any idea what type it is? Me neither.
The compiler doesn't know what so it defines it as an object type which could hold anything. a string, an integer, a list..
Consider this code.
Dim i
Dim x
x = "ab"
For i = x To endcount - 1
t = Mid(s, 999)
Next
The compiler will compile it, but when it is executed you'll get an SystemArgumenException. In this case it's easy to see what is wrong, but often it isn't. And numbers in strings can be a whole new can of worms.
Hope this helps.

Repeatedly checking entire scripting dictionary via loop bad practice? VB6

I have a loop that checks for values and adds them to a dictionary if they do not exist. This check happens inside of a loop so it happens repeatedly. As the dictionary grows I imagine there is an inner loop going on with each check that is becoming costly. In an effort to speed up my entire routine I am wondering if this might be something I should optimize? If so, how?
keycnt = 1
For x = 1 to 500000
STRVAL = returnarray(8, x)
If Not STRDIC.Exists(STRVAL) Then
STRDIC.Add STRVAL, keycnt
keycnt = keycnt + 1
End If
next x
If your dictionary is a Scripting.Dictionary and not some [poorly implemented] custom-made data structure class, then a key lookup should be O(1) complexity, not O(n) as you seem to imply; the growing number of keys has no impact on performance.
Key lookup in a hash table or dictionary is basically free, the only thing to fix in that code is the indentation.

Comparing an item in a list against other items in the same list in VB.NET

Simplified, I have a List(Of MyObj), and I want to iterate through that list and compare each element to all other elements in the same list, excluding (if possible) the same element. I have a solution that works, but it's slow and uses double For loops. It may possibly have also summoned Cthulhu from his sleep.
Is there a better approach? Linq, perhaps? Or some fancy algorithm? This below is a sanitized version of what I have:
Dim MyList As New List(Of MyObj)({Obj1, Obj2, Obj3, Obj4, Obj5, Obj6})
If MyList.Count > 0 Then
For i = 0 To (MyList.Count - 1) Step 1
For j As Int32 = 0 To (MyList.Count - 1) Step 1
If MyList(i).GetHashCode = MyList(j).GetHashCode Then
Continue For
Else
If MyList(i).SomeFunction(MyList(j)) Then
Call DoSomething()
End If
End If
Next j
Next i
Else
' Error Code Here.
End If
This will work in O(M*N) where N is ObjCount and M is the number of non-duplicate objects. Your current solution is O(N^2).
You need a Hash Function. You can determine whether GetHashCode will suffice or whether you need to implement Sha1.
Instantiate a HashSet (or HashTable, depending on your Hash Function)
Add each object, if it does not already exist, into the HashSet or HashTable.
For each object in the HashSet, execute SomeFunction() against every other object in the HashSet. If you dump into an array and iterate via indexes, you only have to compare indexes, rather than objects.
For i as integer = 0 to MyHashResultsArray.Count - 1
For j as integer = 0 to MyHashResultsArray.Count - 1
if i <> j then
MyHashResultsArray(i).DoSomething(j)
end if
next
next
Important
This is only a good solution IF there exists a significant amount of duplicates, perhaps a duplicate-level of 10% would be necessary to consider this solution, except for very large values of N. If N gets too large, a re-engineering of the application may be necessary to hopefully avoid the need for M actions against M objects.
Edit
Much of the comment discussion was based upon my misunderstanding of the Author's needs regarding the DoSomething() function.
Barring any potential problems with using GetHashCode to check for object equality (best not to do this - it'll only bite you at some point - and it's probably this that has awakened Cthulhu!), your solution is about as fast as it's likely to get.
Sure, you can tweak it, but it will remain O(N^2), that is, the runtime will be of the order of the square of the number of elements in your list. If you double the number of elements, your runtime will increase by a factor of 4.
See if this will work
MyList.Select(Function(x) MyList.Except(New () {x}).ToList().ForEach(Sub(y) Do
If x.SomeFunction(y) Then
DoSomething()
End If
End Sub))

Loop takes forever with large count

This loop takes forever to run as the amount of items in the loop approach anything close to and over 1,000, close to like 10 minutes. This needs to run fast for amounts all the way up to like 30-40 thousand.
'Add all Loan Record Lines
Dim loans As List(Of String) = lar.CreateLoanLines()
Dim last As Integer = loans.Count - 1
For i = 0 To last
If i = last Then
s.Append(loans(i))
Else
s.AppendLine(loans(i))
End If
Next
s is a StringBuilder. The first line there
Dim loans As List(Of String) = lar.CreateLoanLines()
Runs in only a few seconds even with thousands of records. It's the actual loop that's taking a while.
How can this be optimized???
Set the initial capacity of your StringBuilder to a large value. (Ideally, large enough to contain the entire final string.) Like so:
s = new StringBuilder(loans.Count * averageExpectedStringSize)
If you don't specify a capacity, the builder will likely end up doing a large amount of internal reallocations, and this will kill performance.
You could take the special case out of the loop, so you wouldn't need to be checking it inside the loop. I would expect this to have almost no impact on performance, however.
For i = 0 To last - 1
s.AppendLine(loans(i))
Next
s.Append(loans(last))
Though, internally, the code is very similar, if you're using .NET 4, I'd consider replacing your method with a single call to String.Join:
Dim result as String = String.Join(Envionment.NewLine, lar.CreateLoanLines())
I can't see how the code you have pointed out could be slow unless:
The strings you are dealing with are huggggge (e.g. if the resulting string is 1 gigabyte).
You have another process running on your machine consuming all your clock cycles.
You haven't got enough memory in your machine.
Try stepping through the code line by line and check that the strings contain the data that you expect, and check Task Manager to see how much memory your application is using and how much free memory you have.
My guess would be that every time you're using append it's creating a new string. You seem to know how much memory you'll need, if you allocate all of the memory first and then just copy it into memory it should run much faster. Although I may be confused as to how vb.net works.
You could look at doing this another way.
Dim str As String = String.Join(Environment.NewLine, loans.ToArray)