Readline Error While Reading From .txt file in vb.net - vb.net

I have a Streamreader which is trowing an error after checking every line in Daycounts.txt. It is not a stable txt file. String lines in it are not stable. Count of lines increasing or decresing constantly. Thats why I am using a range 0 to 167. But
Here is the content of Daycounts.txt: Daycounts
Dim HourSum as integer
Private Sub Change()
Dim R As IO.StreamReader
R = New IO.StreamReader("Daycounts.txt")
Dim sum As Integer = 0
For p = 0 To 167
Dim a As String = R.ReadLine
If a.Substring(0, 2) <> "G." Then
sum += a.Substring(a.Length - 2, 2)
Else
End If
Next
HourSum = sum
R.Close()
End Sub

If you don't know how many lines are present in your text file then you could use the method File.ReadAllLines to load all lines in memory and then apply your logic
Dim HourSum As Integer
Private Sub Change()
Dim lines = File.ReadAllLines("Daycounts.txt")
Dim sum As Integer = 0
For Each line In lines
If line.Substring(0, 2) <> "G." Then
sum += Convert.ToInt32(line.Substring(line.Length - 2, 2))
Else
....
End If
Next
HourSum = sum
End Sub
This is somewhat inefficient because you loop over the lines two times (one to read them in, and one to apply your logic) but with a small set of lines this should be not a big problem
However, you could also use File.ReadLines that start the enumeration of your lines without loading them all in memory. According to this question, ReadLines locks writes your file until the end of your read loop, so, perhaps this could be a better option for you only if you don't have something external to your code writing concurrently to the file.
Dim HourSum As Integer
Private Sub Change()
Dim sum As Integer = 0
For Each line In File.ReadLines("Daycounts.txt")
If line.Substring(0, 2) <> "G." Then
sum += Convert.ToInt32(line.Substring(line.Length - 2, 2))
Else
....
End If
Next
HourSum = sum
End Sub
By the way, notice that I have added a conversion to an integer against the loaded line. In your code, the sum operation is applied directly on the string. This could work only if you have Option Strict set to Off for your project. This setting is a very bad practice maintained for VB6 compatibility and should be changed to Option Strict On for new VB.NET projects

Related

bubble vs insertion sort - trying to write a program to determine which is more efficient

i'm trying to compare these 2 sort algorithms.
I've written a vb.net console program and used excel to create a csv file of 10000 integers randomly created between 0 and 100000.
Insertion sort seems to take approx 10x longer which can't be correct can it?
can anyone point out where i'm going wrong?
module Module1
Dim unsortedArray(10000) As integer
sub main
dim startTick as long
dim endTick as long
loadDataFromFile
startTick = date.now.ticks
insertionsort
endTick = date.now.ticks
console.writeline("ticks for insertion sort = " & (endTick-startTick))
loadDataFromFile
startTick = date.now.ticks
bubblesort
endTick = date.now.ticks
console.writeline("ticks for bubble sort = " & (endTick-startTick))
end sub
sub bubbleSort
dim temp as integer
dim swapped as boolean
dim a as integer = unsortedArray.getupperbound(0)-1
do
swapped=false
for i = 0 to a
if unsortedArray(i)>unsortedArray(i+1) then
temp=unsortedArray(i)
unsortedArray(i)=unsortedArray(i+1)
unsortedArray(i+1)=temp
swapped=true
end if
next i
'a = a - 1
loop until not swapped
end sub
sub insertionSort()
dim temp as string
dim ins as integer
dim low as integer = 0
dim up as integer = unsortedArray.getupperbound(0)
console.writeline()
for i = 1 to up
temp = unsortedArray(i)
ins = i-1
while (ins >= 0) andalso (temp < unsortedArray(ins))
unsortedArray(ins+1) = unsortedArray(ins)
ins = ins -1
end while
unsortedArray(ins+1) = temp
next
end sub
sub loadDataFromFile()
dim dataItem as integer
fileopen(1,FileIO.FileSystem.CurrentDirectory & "\10000.csv", openmode.input)
'set up to loop through each row in the array
for i = 0 to 9999
input(1,dataItem)
'save that data item in correct array positon
unsortedArray(i) = dataItem
next i
fileclose(1)
end sub
dim temp as string
You've declared your temporary variable as a string instead of an integer. VB.Net is perfectly happy to allow you to do this sort of sloppy thing, and it will convert the numeric value to a string and back. This is a very expensive operation.
If you go into your project options, under "Compile", do yourself a favour and turn on "Option Strict". This will disallow implicit type conversions like this and force you to fix it, showing you exactly where you made the error.
"Option Strict" is off by default for legacy reasons, simply to allow badly written legacy VB code to be compiled without complaint in vb.net. There is otherwise no sane reason to leave it turned off.
Changing the declaration to
Dim temp As Integer
reveals that the insertion sort is indeed about 3-5 times faster than the bubble on average.

Progress bar with VB.NET Console Application

I've written a parsing utility as a Console Application and have it working pretty smoothly. The utility reads delimited files and based on a user value as a command line arguments splits the record to one of 2 files (good records or bad records).
Looking to do a progress bar or status indicator to show work performed or remaining work while parsing. I could easily write a <.> across the screen within the loop but would like to give a %.
Thanks!
Here is an example of how to calculate the percentage complete and output it in a progress counter:
Option Strict On
Option Explicit On
Imports System.IO
Module Module1
Sub Main()
Dim filePath As String = "C:\StackOverflow\tabSeperatedFile.txt"
Dim FileContents As String()
Console.WriteLine("Reading file contents")
Using fleStream As StreamReader = New StreamReader(IO.File.Open(filePath, FileMode.Open, FileAccess.Read))
FileContents = fleStream.ReadToEnd.Split(CChar(vbTab))
End Using
Console.WriteLine("Sorting Entries")
Dim TotalWork As Decimal = CDec(FileContents.Count)
Dim currentLine As Decimal = 0D
For Each entry As String In FileContents
'Do something with the file contents
currentLine += 1D
Dim progress = CDec((currentLine / TotalWork) * 100)
Console.SetCursorPosition(0I, Console.CursorTop)
Console.Write(progress.ToString("00.00") & " %")
Next
Console.WriteLine()
Console.WriteLine("Finished.")
Console.ReadLine()
End Sub
End Module
1rst you have to know how many lines you will expect.
In your loop calculate "intLineCount / 100 * intCurrentLine"
int totalLines = 0 // "GetTotalLines"
int currentLine = 0;
foreach (line in Lines)
{
/// YOUR OPERATION
currentLine ++;
int progress = totalLines / 100 * currentLine;
///print out the result with the suggested method...
///!Caution: if there are many updates consider to update the output only if the value has changed or just every n loop by using the MOD operator or any other useful approach ;)
}
and print the result on the same posititon in your loop by using the SetCursor method
MSDN Console.SetCursorPosition
VB.NET:
Dim totalLines as Integer = 0
Dim currentLine as integer = 0
For Each line as string in Lines
' Your operation
currentLine += 1I
Dim Progress as integer = (currentLine / totalLines) * 100
' print out the result with the suggested method...
' !Caution: if there are many updates consider to update the output only if the value has changed or just every n loop by using the MOD operator or any other useful approach
Next
Well The easiest way is to update the progressBar variable often,
Ex: if your code consist of around 100 lines or may be 100 functionality
after each function or certain lines of code update progressbar variable with percentage :)

A loop exits prematurely when using StreamReader

It's been a long time since I've programmed. I'm writing a form in VB.NET, and using StreamReader to read a text file and populate an 2D array. Here is the text file:
あかさたなはまやらわん
いきしちにひみ り
うくすつぬふむゆる
えけせてねへめ れ
おこそとのほもよろを
And here is the loop, which is within the Load event.
Dim Line As String
Dim Row As Integer = 0
Using sReader As New IO.StreamReader("KanaTable.txt")
Do
Line = sReader.ReadLine
For i = 0 To Line.Length - 1
KanaTable(Row, i) = Line(i)
Next
Row += 1
Loop Until sReader.EndOfStream
End Using
The problem is, once the i in the For Loop reaches 10, it completes the loop and skips the other lines, even when I have a breakpoint. Can you let me know what's probably going on here?
I've figured out the problem, it was very simple. The array declaration for KanaTable:
Dim KanaTable(4, 9) As Char
should have been
Dim KanaTable(4, 10) As Char
Because there was one less space in the array than there should have been, the debugger must have been throwing an IndexOutOfRange which I couldn't see, because, stupid Windows bug (thanks to Bradley Uffner for pointing out this bug.)
If you can use an array of arrays or a list of arrays (List(Of Char())), you can get this down to a single line of code:
Dim KanaTable()() As Char = IO.File.ReadLines("KanaTable.txt").Select(Function(line) line.ToCharArray()).ToArray()
If that's too complicated for you, we can at least simplify the existing code:
Dim KanaTable As New List(Of Char())
Dim Line As String
Using sReader As New IO.StreamReader("KanaTable.txt")
Line = sReader.ReadLine()
While Line IsNot Nothing
KanaTable.Add(Line.ToCharArray())
Line = sReader.ReadLine()
End While
End Using
I can't see an error immediately, but you could try to adapt your code to this:
Using reader As New IO.StreamReader("KanaTable.txt")
Do
line= reader.ReadLine()
If line = Nothing Then
Exit Do
End If
For i = 0 To Line.Length - 1
KanaTable(Row, i) = Line(i)
Next
Row += 1
Loop
End Using

Start reading massive text file from the end

I would ask if you could give me some alternatives in my problems.
basically I'm reading a .txt log file averaging to 8 million lines. Around 600megs of pure raw txt file.
I'm currently using streamreader to do 2 passes on those 8 million lines doing sorting and filtering important parts in the log file, but to do so, My computer is taking ~50sec to do 1 complete run.
One way that I can optimize this is to make the first pass to start reading at the end because the most important data is located approximately at the final 200k line(s) . Unfortunately, I searched and streamreader can't do this. Any ideas to do this?
Some general restriction
# of lines varies
size of file varies
location of important data varies but approx at the final 200k line
Here's the loop code for the first pass of the log file just to give you an idea
Do Until sr.EndOfStream = True 'Read whole File
Dim streambuff As String = sr.ReadLine 'Array to Store CombatLogNames
Dim CombatLogNames() As String
Dim searcher As String
If streambuff.Contains("CombatLogNames flags:0x1") Then 'Keyword to Filter CombatLogNames Packets in the .txt
Dim check As String = streambuff 'Duplicate of the Line being read
Dim index1 As Char = check.Substring(check.IndexOf("(") + 1) '
Dim index2 As Char = check.Substring(check.IndexOf("(") + 2) 'Used to bypass the first CombatLogNames packet that contain only 1 entry
If (check.IndexOf("(") <> -1 And index1 <> "" And index2 <> " ") Then 'Stricter Filters for CombatLogNames
Dim endCLN As Integer = 0 'Signifies the end of CombatLogNames Packet
Dim x As Integer = 0 'Counter for array
While (endCLN = 0 And streambuff <> "---- CNETMsg_Tick") 'Loops until the end keyword for CombatLogNames is seen
streambuff = sr.ReadLine 'Reads a new line to flush out "CombatLogNames flags:0x1" which is unneeded
If ((streambuff.Contains("---- CNETMsg_Tick") = True) Or (streambuff.Contains("ResponseKeys flags:0x0 ") = True)) Then
endCLN = 1 'Value change to determine end of CombatLogName packet
Else
ReDim Preserve CombatLogNames(x) 'Resizes the array while preserving the values
searcher = streambuff.Trim.Remove(streambuff.IndexOf("(") - 5).Remove(0, _
streambuff.Trim.Remove(streambuff.IndexOf("(")).IndexOf("'")) 'Additional filtering to get only valuable data
CombatLogNames(x) = search(searcher)
x += 1 '+1 to Array counter
End If
End While
Else
'MsgBox("Something went wrong, Flame the coder of this program!!") 'Bug Testing code that is disabled
End If
Else
End If
If (sr.EndOfStream = True) Then
ReDim GlobalArr(CombatLogNames.Length - 1) 'Resizing the Global array to prime it for copying data
Array.Copy(CombatLogNames, GlobalArr, CombatLogNames.Length) 'Just copying the array to make it global
End If
Loop
You CAN set the BaseStream to the desired reading position, you just cant set it to a specfic LINE (because counting lines requires to read the complete file)
Using sw As New StreamWriter("foo.txt", False, System.Text.Encoding.ASCII)
For i = 1 To 100
sw.WriteLine("the quick brown fox jumps ovr the lazy dog")
Next
End Using
Using sr As New StreamReader("foo.txt", System.Text.Encoding.ASCII)
sr.BaseStream.Seek(-100, SeekOrigin.End)
Dim garbage = sr.ReadLine ' can not use, because very likely not a COMPLETE line
While Not sr.EndOfStream
Dim line = sr.ReadLine
Console.WriteLine(line)
End While
End Using
For any later read attempt on the same file, you could simply save the final position (of the basestream) and on the next read to advance to that position before you start reading lines.
What worked for me was skipping first 4M lines (just a simple if counter > 4M surrounding everything inside the loop), and then adding background workers that did the filtering, and if important added the line to an array, while main thread continued reading the lines. This saved about third of the time at the end of a day.

Is there any way I can speed up this VBA algorithm?

I am looking to implement a VBA trie-building algorithm that is able to process a substantial English lexicon (~50,000 words) in a relatively short amount of time (less than 15-20 seconds). Since I am a C++ programmer by practice (and this is my first time doing any substantial VBA work), I built a quick proof-of-concept program that was able to complete the task on my computer in about half a second. When it came time to test the VBA port however, it took almost two minutes to do the same -- an unacceptably long amount of time for my purposes. The VBA code is below:
Node Class Module:
Public letter As String
Public next_nodes As New Collection
Public is_word As Boolean
Main Module:
Dim tree As Node
Sub build_trie()
Set tree = New Node
Dim file, a, b, c As Integer
Dim current As Node
Dim wordlist As Collection
Set wordlist = New Collection
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
Do While Not EOF(file)
Dim line As String
Line Input #file, line
wordlist.add line
Loop
For a = 1 To wordlist.Count
Set current = tree
For b = 1 To Len(wordlist.Item(a))
Dim match As Boolean
match = False
Dim char As String
char = Mid(wordlist.Item(a), b, 1)
For c = 1 To current.next_nodes.Count
If char = current.next_nodes.Item(c).letter Then
Set current = current.next_nodes.Item(c)
match = True
Exit For
End If
Next c
If Not match Then
Dim new_node As Node
Set new_node = New Node
new_node.letter = char
current.next_nodes.add new_node
Set current = new_node
End If
Next b
current.is_word = True
Next a
End Sub
My question then is simply, can this algorithm be sped up? I saw from some sources that VBA Collections are not as efficient as Dictionarys and so I attempted a Dictionary-based implementation instead but it took an equal amount of time to complete with even worse memory usage (500+ MB of RAM used by Excel on my computer). As I say I am extremely new to VBA so my knowledge of both its syntax as well as its overall features/limitations is very limited -- which is why I don't believe that this algorithm is as efficient as it could possibly be; any tips/suggestions would be greatly appreciated.
Thanks in advance
NB: The lexicon file referred to by the code, "corncob_caps.txt", is available here (download the "all CAPS" file)
There are a number of small issues and a few larger opportunities here. You did say this is your first vba work, so forgive me if I'm telling you things you already know
Small things first:
Dim file, a, b, c As Integer declares file, a and b as variants. Integer is 16 bit sign, so there may be risk of overflows, use Long instead.
DIM'ing inside loops is counter-productive: unlike C++ they are not loop scoped.
The real opportunity is:
Use For Each where you can to iterate collections: its faster than indexing.
On my hardware your original code ran in about 160s. This code in about 2.5s (both plus time to load word file into the collection, about 4s)
Sub build_trie()
Dim t1 As Long
Dim wd As Variant
Dim nd As Node
Set tree = New Node
' Dim file, a, b, c As Integer : declares file, a, b as variant
Dim file As Integer, a As Long, b As Long, c As Long ' Integer is 16 bit signed
Dim current As Node
Dim wordlist As Collection
Set wordlist = New Collection
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
' no point in doing inside loop, they are not scoped to the loop
Dim line As String
Dim match As Boolean
Dim char As String
Dim new_node As Node
Do While Not EOF(file)
'Dim line As String
Line Input #file, line
wordlist.Add line
Loop
t1 = GetTickCount
For Each wd In wordlist ' for each is faster
'For a = 1 To wordlist.Count
Set current = tree
For b = 1 To Len(wd)
'Dim match As Boolean
match = False
'Dim char As String
char = Mid$(wd, b, 1)
For Each nd In current.next_nodes
'For c = 1 To current.next_nodes.Count
If char = nd.letter Then
'If char = current.next_nodes.Item(c).letter Then
Set current = nd
'Set current = current.next_nodes.Item(c)
match = True
Exit For
End If
Next nd
If Not match Then
'Dim new_node As Node
Set new_node = New Node
new_node.letter = char
current.next_nodes.Add new_node
Set current = new_node
End If
Next b
current.is_word = True
Next wd
Debug.Print "Time = " & GetTickCount - t1 & " ms"
End Sub
EDIT:
loading the word list into a dynamic array will reduce load time to sub second. Be aware that Redim Preserve is expensive, so do it in chunks
Dim i As Long, sz As Long
sz = 10000
Dim wordlist() As String
ReDim wordlist(0 To sz)
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
i = 0
Do While Not EOF(file)
'Dim line As String
Line Input #file, line
wordlist(i) = line
i = i + 1
If i > sz Then
sz = sz + 10000
ReDim Preserve wordlist(0 To sz)
End If
'wordlist.Add line
Loop
ReDim Preserve wordlist(0 To i - 1)
then loop through it like
For i = 0 To UBound(wordlist)
wd = wordlist(i)
I'm out of practice with VBA, but IIRC, iterating the Collection using For Each should be a bit faster than going numerically:
Dim i As Variant
For Each i In current.next_nodes
If i.letter = char Then
Set current = i
match = True
Exit For
End If
Next node
You're also not using the full capabilities of Collection. It's a Key-Value map, not just a resizeable array. You might get better performance if you use the letter as a key, though looking up a key that isn't present throws an error, so you have to use an ugly error workaround to check for each node. The inside of the b loop would look like:
Dim char As String
char = Mid(wordlist.Item(a), b, 1)
Dim node As Node
On Error Resume Next
Set node = Nothing
Set node = current.next_nodes.Item(char)
On Error Goto 0
If node Is Nothing Then
Set node = New Node
current.next_nodes.add node, char
Endif
Set current = node
You won't need the letter variable on class Node that way.
I didn't test this. I hope it's all right...
Edit: Fixed the For Each loop.
Another thing you can do which will possibly be slower but will use less memory is use an array instead of a collection, and resize with each added element. Arrays can't be public on classes, so you have to add methods to the class to deal with it:
Public letter As String
Private next_nodes() As Node
Public is_word As Boolean
Public Sub addNode(new_node As Node)
Dim current_size As Integer
On Error Resume Next
current_size = UBound(next_nodes) 'ubound throws an error if the array is not yet allocated
On Error GoTo 0
ReDim next_nodes(0 To current_size) As Node
Set next_nodes(current_size) = new_node
End Sub
Public Function getNode(letter As String) As Node
Dim n As Variant
On Error Resume Next
For Each n In next_nodes
If n.letter = letter Then
Set getNode = n
Exit Function
End If
Next
End Function
Edit: And a final optimization strategy, get the Integer char value with the Asc function and store that instead of a String.
You really need to profile it, but if you think Collections are slow maybe you can try using dynamic arrays?