bubble vs insertion sort - trying to write a program to determine which is more efficient - vb.net

i'm trying to compare these 2 sort algorithms.
I've written a vb.net console program and used excel to create a csv file of 10000 integers randomly created between 0 and 100000.
Insertion sort seems to take approx 10x longer which can't be correct can it?
can anyone point out where i'm going wrong?
module Module1
Dim unsortedArray(10000) As integer
sub main
dim startTick as long
dim endTick as long
loadDataFromFile
startTick = date.now.ticks
insertionsort
endTick = date.now.ticks
console.writeline("ticks for insertion sort = " & (endTick-startTick))
loadDataFromFile
startTick = date.now.ticks
bubblesort
endTick = date.now.ticks
console.writeline("ticks for bubble sort = " & (endTick-startTick))
end sub
sub bubbleSort
dim temp as integer
dim swapped as boolean
dim a as integer = unsortedArray.getupperbound(0)-1
do
swapped=false
for i = 0 to a
if unsortedArray(i)>unsortedArray(i+1) then
temp=unsortedArray(i)
unsortedArray(i)=unsortedArray(i+1)
unsortedArray(i+1)=temp
swapped=true
end if
next i
'a = a - 1
loop until not swapped
end sub
sub insertionSort()
dim temp as string
dim ins as integer
dim low as integer = 0
dim up as integer = unsortedArray.getupperbound(0)
console.writeline()
for i = 1 to up
temp = unsortedArray(i)
ins = i-1
while (ins >= 0) andalso (temp < unsortedArray(ins))
unsortedArray(ins+1) = unsortedArray(ins)
ins = ins -1
end while
unsortedArray(ins+1) = temp
next
end sub
sub loadDataFromFile()
dim dataItem as integer
fileopen(1,FileIO.FileSystem.CurrentDirectory & "\10000.csv", openmode.input)
'set up to loop through each row in the array
for i = 0 to 9999
input(1,dataItem)
'save that data item in correct array positon
unsortedArray(i) = dataItem
next i
fileclose(1)
end sub

dim temp as string
You've declared your temporary variable as a string instead of an integer. VB.Net is perfectly happy to allow you to do this sort of sloppy thing, and it will convert the numeric value to a string and back. This is a very expensive operation.
If you go into your project options, under "Compile", do yourself a favour and turn on "Option Strict". This will disallow implicit type conversions like this and force you to fix it, showing you exactly where you made the error.
"Option Strict" is off by default for legacy reasons, simply to allow badly written legacy VB code to be compiled without complaint in vb.net. There is otherwise no sane reason to leave it turned off.
Changing the declaration to
Dim temp As Integer
reveals that the insertion sort is indeed about 3-5 times faster than the bubble on average.

Related

VB.net, Linq -- How to compare items in List of Strings

I have got a list of Strings looking like this:
The items of the list of Strings are formated like this "#,#" where # stands for an integer number and # stands for a string or number.
I need to find the index of the first occurrence where the integer number is lower than the integer number of the previous item.
I am able to find it using a loop over all entries like this:
For X = 0 To UBound(TempList.ToArray) - 1
If Val(Left(TempList(X), InStr(TempList(X), ",") - 1)) > Val(Left(TempList(X + 1), InStr(TempList(X + 1), ",") - 1)) Then
Dim Result As String = TempList(X) & " -- " & TempList(X + 1)
End If
Next
I am sure that this can be done much smarter using linq - but my very poor knowledge regarding linq is not enough ...
Can someone please give me a hint?
Linq is cool but it is not necessarily faster. It is looping internally.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim TempList As New List(Of String) From {"450,245.5", "510,1", "520,1", "703,1", "704,0", "705,0", "706,0", "901,1.244", "921,3", "1,1", "2,1", "3,0"}
Dim Result As String = ""
For i As Integer = 1 To TempList.Count - 1
If CInt(TempList(i).Split(","c)(0)) < CInt(TempList(i - 1).Split(","c)(0)) Then
Result = $"{TempList(i)} At index {i} -- {TempList(i - 1)}"
Exit For 'Stop looping when found
End If
Next
MessageBox.Show(Result)
End Sub
You can use LINQ for the loop, but I do think a For / For Each is sufficient - see Mary's answer.
But at least, you could use LINQ to parse your list into something other than strings to make it more manageable when it's used. This will decouple your code a bit and will make it clearer. I'll hold the pair of data in a Tuple(Of Double, String). If I knew the first item was distinct, I would use a Dictionary(Of Double, String).
So for fun, you can use TakeWhile instead of a For / For Each.
Dim TempList = { "450,245.5", "510,1", "520,1", "701,0", "702,0", "703,1", "704,0", "705,0", "706,0", "720,0", "901.2,455", "921,3", "1,1", "2,1", "3,0"}
Dim parsedList = TempList.
Select(Function(s) s.Split(","c)).
Select(Function(ss) (CDbl(ss(0)), ss(1)))
Dim lastValue As Double = Double.NaN
Dim index = parsedList.TakeWhile(
Function(t)
Dim result = Double.IsNaN(lastValue) OrElse t.Item1 > lastValue
lastValue = t.Item1
Return result
End Function).Count()
Dim item = parsedList(index)
Console.WriteLine($"Index: {index}, number: {item.Item1}, string: {item.Item2}")
Index: 12, number: 1, string: 1
This is basically just making a For Each with LINQ. It's probably not necessary, and a simple loop is arguably more readable, and you won't gain any benefit in performance either.

VBA Collection Class: Unwated Data Overwriting

I have a Collection Class (or rather a dictionary class, in this case) that is used to store a variable amount of edge objects. When I try to populate the Dictionary that holds all the information via loop, the data is continuously overwritten and I cannot seem to figure out why. The code for the class in question follows:
Option Explicit
Private pEdges As New Scripting.Dictionary
Property Get Count() As Long
Count = pEdges.Count
End Property
Property Get EdgeByName(ByVal iName As Variant) As cEdge
Set EdgeByName = pEdges(iName)
End Property
'Would it be better to pass all of the data to this add sub, and create
'the class objects here, rather than creating a temporary class object and
'just passing it along?
Sub Add(ByVal iEdge As cEdge)
Dim Edge As New cEdge
Set Edge = iEdge
pEdges.Add Edge.Name, Edge
End Sub
Sub Remove(ByVal iName As Variant)
pEdges.Remove (iName)
End Sub
Sub RemoveAll()
pEdges.RemoveAll
End Sub
Sub PrintNames()
Dim Key As Variant
For Each Key In pEdges
Debug.Print Key & " - " & pEdges(Key).Name & vbCrLf;
Next
Debug.Print vbdrlf;
End Sub
Sub that generates the Edges object follows:
Sub CalculateEdges(cCavities() As cCavity, dEdges As cEdges)
Dim i As Integer
For i = 1 To UBound(cCavities)
Dim TempEdge As cEdge
Set TempEdge = New cEdge
Dim AdjSize As Integer
AdjSize = cCavities(i).AdjacencySize
If AdjSize> MaxEdges Then MaxEdges = AdjSize
Dim j As Integer
For j = 1 To AdjSize
With TempEdge
'Edge Names are a combination of two node names
.Name = cCavities(i).Name & cCavities(i).Adjacency(j)
'Sets the start node (Object) for the edge
.SetNode cCavities(i), 0
'Sets the end node (Object) for the edge
.SetNode BackGround.NodeByName(cCavities, cCavities(i).Adjacency(j)), 1
'Used later in program
.Value = 0
End With
dEdges.Add TempEdge
dEdges.PrintNames
Next j
Next i
End Sub
The output of the dEdges.PrintNames sub is what I have been using for debugging this (since the Watches window doesn't show the item data of a dictionary).
As the loops go on it prints the Key and the Name Value of the edge object that the key corresponds to. If working correctly, these two strings should be identical. As it is though, every time I add a new edge object to the dictionary, it overwrites the objects for all the previously entered keys. I have the suspicion that this is related to the fact that I create a TempEdge Variable to pass to the Collection Class, but I am not sure.
Example of output:
C1C2 - C1C2
C1C2 - C1C3
C1C3 - C1C3
C1C2 - C1C4
C1C3 - C1C4
C1C4 - C1C4
ETC
This is just one single data point being tested, but let me assure you that all the variables inside the cEdge object are overwritten, not just the name string. It is simply the easiest to check since it is just a string.
As a side note, if there is a way to see the Object stored in the dictionary, similar to the "Watches" window, I would very much like to know how to do it. The entire reason I am even using the temp edge at this point is so I can keep track of what data is going into the dictionary at any given point in the loop.
Second side note, If I can get this working I will most likely switch the cCavities array to a similar collection class. It is not currently one because I cant seem to get them working right.
Moving the Set "TempEdge = New cEdge" into the loop will create a new instance and a new pointer location with every loop while maintaining your collections references to previous pointers.
Sub CalculateEdges(cCavities() As cCavity, dEdges As cEdges)
Dim i As Integer
For i = 1 To UBound(cCavities)
Dim TempEdge As cEdge
Dim AdjSize As Integer
AdjSize = cCavities(i).AdjacencySize
If AdjSize> MaxEdges Then MaxEdges = AdjSize
Dim j As Integer
For j = 1 To AdjSize
Set TempEdge = New cEdge
With TempEdge
'Edge Names are a combination of two node names
.Name = cCavities(i).Name & cCavities(i).Adjacency(j)
'Sets the start node (Object) for the edge
.SetNode cCavities(i), 0
'Sets the end node (Object) for the edge
.SetNode BackGround.NodeByName(cCavities, cCavities(i).Adjacency(j)), 1
'Used later in program
.Value = 0
End With
dEdges.Add TempEdge
dEdges.PrintNames
Next j
Next i
End Sub
I went ahead with the idea to pass along all the data to the add routine, and it seems to have solved the issue. I would still like to know why the method I was using did not work, though, so please feel free to comment or answer with regards to that.
The solution was to change the cEdges.Add Sub to accept all the individual parameters that were once passed to the temporary edge variable:
Sub Add(ByVal iName As String, iNode1 As cCavity, iNode2 As cCavity, iValue As Integer)
Dim Edge As New cEdge
With Edge
.Name = iName
.SetNode iNode1, 0
.SetNode iNode2, 1
.Value = iValue
End With
pEdges.Add Edge.Name, Edge
End Sub
This changes the populating loop to look like:
Sub CalculateEdges(cCavities() As cCavity, dEdges As cEdges)
Dim i As Integer
For i = 1 To UBound(cCavities)
Dim AdjSize As Integer
AdjSize = cCavities(i).AdjacencySize
If AdjSize > MaxEdges Then MaxEdges = AdjSize
Dim j As Integer
For j = 1 To AdjSize
dEdges.Add cCavities(i).Name & cCavities(i).Adjacency(j), cCavities(i), BackGround.NodeByName(cCavities, cCavities(i).Adjacency(j)), 0
dEdges.PrintNames
Next j
Next i
End Sub
This code, especially the .Add line, could be cleaned up. I will most likely do that, but this is fine for now.
EDIT: Upon further research and a bit more trial and error, I have discovered the reason for the data being overwritten. The Set keyword only creates a pointer to the original value, effectively making my above code have one object, the TempEdge variable, and a whole bunch of different heads that pointed to it. That is why when the Temp edge was edited, all the subsequent heads changes.

Why is assigning the Value property of cell causing code to end aburptly?

Private Sub FillRow(programCell As Range, storedProgramCell As Range)
Dim counter As Integer
For counter = 3 To 9
Dim cellOffset As Integer
cellOffset = counter - 3
Dim currentStoredCell As Range
Set currentStoredCell = storedProgramCell.Offset(0, cellOffset)
Dim value As String
value = currentStoredCell.value
Dim currentTargetCell As Range
Set currentTargetCell = programCell.Offset(0, cellOffset)
MsgBox currentStoredCell.value 'Works correctly, prints correct value
currentTargetCell.value = value
Next counter
End Sub
The line:
currentTargetCell.value = value
causes the code to stop executing, with no error.
I added the expression to my watch list, then stepped through the routine. The expression was seen as a Boolean:
This makes me think the expression is being viewed as a comparison, and the program abruptly ends since the returned Boolean is not being stored or used anywhere. I wouldn't doubt if I were wrong though.
I'm new to VBA, struggling to debug my program, so please forgive me if this is a petty mistake. I couldn't find any sources online that explains this problem.
Replace your subroutine with following code:
Private Sub FillRow(Dst As Range, Src As Range)
Dim x As Integer
Dim v As Variant
Dim Srcx As Range
Dim Dstx As Range
Debug.Print "FillRow"
Debug.Print Src.Address
Debug.Print Dst.Address
Debug.Print "Loop"
For x = 0 To 6
Debug.Print x
Set Srcx = Src.Offset(0, x)
Debug.Print Srcx.Address
v = Srcx.Value
Debug.Print TypeName(v)
Set Dstx = Dst.Offset(0, x)
Debug.Print Dstx.Address
Dstx.Value = v
Next
Debug.Print "Completed"
End Sub
Run and post in your question Immediate window output.
Value is a reserved word, even if vba does not raise an error on this name, you should not use it. Name it something else. Also, try setting it as a variant.

Readline Error While Reading From .txt file in vb.net

I have a Streamreader which is trowing an error after checking every line in Daycounts.txt. It is not a stable txt file. String lines in it are not stable. Count of lines increasing or decresing constantly. Thats why I am using a range 0 to 167. But
Here is the content of Daycounts.txt: Daycounts
Dim HourSum as integer
Private Sub Change()
Dim R As IO.StreamReader
R = New IO.StreamReader("Daycounts.txt")
Dim sum As Integer = 0
For p = 0 To 167
Dim a As String = R.ReadLine
If a.Substring(0, 2) <> "G." Then
sum += a.Substring(a.Length - 2, 2)
Else
End If
Next
HourSum = sum
R.Close()
End Sub
If you don't know how many lines are present in your text file then you could use the method File.ReadAllLines to load all lines in memory and then apply your logic
Dim HourSum As Integer
Private Sub Change()
Dim lines = File.ReadAllLines("Daycounts.txt")
Dim sum As Integer = 0
For Each line In lines
If line.Substring(0, 2) <> "G." Then
sum += Convert.ToInt32(line.Substring(line.Length - 2, 2))
Else
....
End If
Next
HourSum = sum
End Sub
This is somewhat inefficient because you loop over the lines two times (one to read them in, and one to apply your logic) but with a small set of lines this should be not a big problem
However, you could also use File.ReadLines that start the enumeration of your lines without loading them all in memory. According to this question, ReadLines locks writes your file until the end of your read loop, so, perhaps this could be a better option for you only if you don't have something external to your code writing concurrently to the file.
Dim HourSum As Integer
Private Sub Change()
Dim sum As Integer = 0
For Each line In File.ReadLines("Daycounts.txt")
If line.Substring(0, 2) <> "G." Then
sum += Convert.ToInt32(line.Substring(line.Length - 2, 2))
Else
....
End If
Next
HourSum = sum
End Sub
By the way, notice that I have added a conversion to an integer against the loaded line. In your code, the sum operation is applied directly on the string. This could work only if you have Option Strict set to Off for your project. This setting is a very bad practice maintained for VB6 compatibility and should be changed to Option Strict On for new VB.NET projects

Is there any way I can speed up this VBA algorithm?

I am looking to implement a VBA trie-building algorithm that is able to process a substantial English lexicon (~50,000 words) in a relatively short amount of time (less than 15-20 seconds). Since I am a C++ programmer by practice (and this is my first time doing any substantial VBA work), I built a quick proof-of-concept program that was able to complete the task on my computer in about half a second. When it came time to test the VBA port however, it took almost two minutes to do the same -- an unacceptably long amount of time for my purposes. The VBA code is below:
Node Class Module:
Public letter As String
Public next_nodes As New Collection
Public is_word As Boolean
Main Module:
Dim tree As Node
Sub build_trie()
Set tree = New Node
Dim file, a, b, c As Integer
Dim current As Node
Dim wordlist As Collection
Set wordlist = New Collection
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
Do While Not EOF(file)
Dim line As String
Line Input #file, line
wordlist.add line
Loop
For a = 1 To wordlist.Count
Set current = tree
For b = 1 To Len(wordlist.Item(a))
Dim match As Boolean
match = False
Dim char As String
char = Mid(wordlist.Item(a), b, 1)
For c = 1 To current.next_nodes.Count
If char = current.next_nodes.Item(c).letter Then
Set current = current.next_nodes.Item(c)
match = True
Exit For
End If
Next c
If Not match Then
Dim new_node As Node
Set new_node = New Node
new_node.letter = char
current.next_nodes.add new_node
Set current = new_node
End If
Next b
current.is_word = True
Next a
End Sub
My question then is simply, can this algorithm be sped up? I saw from some sources that VBA Collections are not as efficient as Dictionarys and so I attempted a Dictionary-based implementation instead but it took an equal amount of time to complete with even worse memory usage (500+ MB of RAM used by Excel on my computer). As I say I am extremely new to VBA so my knowledge of both its syntax as well as its overall features/limitations is very limited -- which is why I don't believe that this algorithm is as efficient as it could possibly be; any tips/suggestions would be greatly appreciated.
Thanks in advance
NB: The lexicon file referred to by the code, "corncob_caps.txt", is available here (download the "all CAPS" file)
There are a number of small issues and a few larger opportunities here. You did say this is your first vba work, so forgive me if I'm telling you things you already know
Small things first:
Dim file, a, b, c As Integer declares file, a and b as variants. Integer is 16 bit sign, so there may be risk of overflows, use Long instead.
DIM'ing inside loops is counter-productive: unlike C++ they are not loop scoped.
The real opportunity is:
Use For Each where you can to iterate collections: its faster than indexing.
On my hardware your original code ran in about 160s. This code in about 2.5s (both plus time to load word file into the collection, about 4s)
Sub build_trie()
Dim t1 As Long
Dim wd As Variant
Dim nd As Node
Set tree = New Node
' Dim file, a, b, c As Integer : declares file, a, b as variant
Dim file As Integer, a As Long, b As Long, c As Long ' Integer is 16 bit signed
Dim current As Node
Dim wordlist As Collection
Set wordlist = New Collection
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
' no point in doing inside loop, they are not scoped to the loop
Dim line As String
Dim match As Boolean
Dim char As String
Dim new_node As Node
Do While Not EOF(file)
'Dim line As String
Line Input #file, line
wordlist.Add line
Loop
t1 = GetTickCount
For Each wd In wordlist ' for each is faster
'For a = 1 To wordlist.Count
Set current = tree
For b = 1 To Len(wd)
'Dim match As Boolean
match = False
'Dim char As String
char = Mid$(wd, b, 1)
For Each nd In current.next_nodes
'For c = 1 To current.next_nodes.Count
If char = nd.letter Then
'If char = current.next_nodes.Item(c).letter Then
Set current = nd
'Set current = current.next_nodes.Item(c)
match = True
Exit For
End If
Next nd
If Not match Then
'Dim new_node As Node
Set new_node = New Node
new_node.letter = char
current.next_nodes.Add new_node
Set current = new_node
End If
Next b
current.is_word = True
Next wd
Debug.Print "Time = " & GetTickCount - t1 & " ms"
End Sub
EDIT:
loading the word list into a dynamic array will reduce load time to sub second. Be aware that Redim Preserve is expensive, so do it in chunks
Dim i As Long, sz As Long
sz = 10000
Dim wordlist() As String
ReDim wordlist(0 To sz)
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
i = 0
Do While Not EOF(file)
'Dim line As String
Line Input #file, line
wordlist(i) = line
i = i + 1
If i > sz Then
sz = sz + 10000
ReDim Preserve wordlist(0 To sz)
End If
'wordlist.Add line
Loop
ReDim Preserve wordlist(0 To i - 1)
then loop through it like
For i = 0 To UBound(wordlist)
wd = wordlist(i)
I'm out of practice with VBA, but IIRC, iterating the Collection using For Each should be a bit faster than going numerically:
Dim i As Variant
For Each i In current.next_nodes
If i.letter = char Then
Set current = i
match = True
Exit For
End If
Next node
You're also not using the full capabilities of Collection. It's a Key-Value map, not just a resizeable array. You might get better performance if you use the letter as a key, though looking up a key that isn't present throws an error, so you have to use an ugly error workaround to check for each node. The inside of the b loop would look like:
Dim char As String
char = Mid(wordlist.Item(a), b, 1)
Dim node As Node
On Error Resume Next
Set node = Nothing
Set node = current.next_nodes.Item(char)
On Error Goto 0
If node Is Nothing Then
Set node = New Node
current.next_nodes.add node, char
Endif
Set current = node
You won't need the letter variable on class Node that way.
I didn't test this. I hope it's all right...
Edit: Fixed the For Each loop.
Another thing you can do which will possibly be slower but will use less memory is use an array instead of a collection, and resize with each added element. Arrays can't be public on classes, so you have to add methods to the class to deal with it:
Public letter As String
Private next_nodes() As Node
Public is_word As Boolean
Public Sub addNode(new_node As Node)
Dim current_size As Integer
On Error Resume Next
current_size = UBound(next_nodes) 'ubound throws an error if the array is not yet allocated
On Error GoTo 0
ReDim next_nodes(0 To current_size) As Node
Set next_nodes(current_size) = new_node
End Sub
Public Function getNode(letter As String) As Node
Dim n As Variant
On Error Resume Next
For Each n In next_nodes
If n.letter = letter Then
Set getNode = n
Exit Function
End If
Next
End Function
Edit: And a final optimization strategy, get the Integer char value with the Asc function and store that instead of a String.
You really need to profile it, but if you think Collections are slow maybe you can try using dynamic arrays?