StringBuilder in .net not writing entire file - vb.net

I'm trying to write a text file. Here's my main,slightly clipped for clarity:
Private Sub WriteProperty(FilePath As String)
Try
SB = New StringBuilder
WriteConfig()
'a bunch of methods similar to WriteConfig here...
Dim File As New System.IO.StreamWriter(FilePath)
File.WriteLine(SB.ToString())
Catch ex As Exception
Dim X As Integer = 5 'cheesy way to add a breakpoint
End Try
End Sub
And here is one of about a dozen subs that add text to the file:
Private Sub WriteConfig()
Dim TempSB As New StringBuilder
TempSB.AppendLine("[CONFIG]")
TempSB.AppendLine("[END_CONFIG]")
TempSB.AppendLine("")
SB.Append(TempSB)
End Sub
There are about a dozen methods that add things like this, most of them add about 2k of text instead of the couple of lines in this example. When I examine SB in the debugger the total result is a little over 15k long. But when I open the file, it's 12k, and the end is all missing - it cuts off in the middle of one of the strings. There is no exception raised.
I know SB has problems with lots of little appends, which is why I used the TempSB's in the subs, but it has the exact same problem, and if I add them directly to SB instead the only difference is the "break" occurs a few characters earlier.
Can anyone offer a suggestion as to what might be happening?

StreamWriter uses an internal buffer. You need to Close() your StreamWriter to force it to write the remaining buffered data to the file. Better yet, wrap it in a Using statement. That will call its Dispose(), which in turn calls its Close().
Using File As New System.IO.StreamWriter(FilePath)
File.WriteLine(SB.ToString())
End Using
There's a convenience method that will do this for you in a single line:
System.IO.File.WriteAllText(FilePath, SB.ToString())

Related

VB - Writing to a file, closing and Object existance

I am writing a program that writes data to a text file at different points in my code, for example in different subroutines, functions or at different parts of subroutines (being scattered around).
First, I Dim the file writer:
Dim CurrentHisWriter As System.IO.StreamWriter
I tell it where to write to:
CurrentHisWriter = New System.IO.StreamWriter("C:\ProgramData\Japanese Conjugation Helper\LastSearch.txt")
Then, I actually write things:
CurrentHisWriter.Writeline("thing to write")
The problem is that I have to change to a different subroutine and then keep on writing to a file, so I have to close the writer and then dim another one in another subroutine:
CurrentHisWriter.Close
NewSubroutine()
[NewSubroutine]:
Dim CurrentHisWriter As System.IO.StreamWriter
CurrentHisWriter = New System.IO.StreamWriter("C:\ProgramData\Japanese Conjugation Helper\LastSearch.txt")
But then when I do this, I gives me one of a couple errors:
The program is has an instance of the file running
Some thing to do with there being no object (I don't remember exactly)
What is a reliable way programming the writing to files without having to worry about closing the writer at every point I change subroutines. I'm not sure about how objects and instances work and so the only thing I can do now is make a catch loop around every single line that uses the "CurrentHisWriter.Writeline" but this isn't really working too.
I know my lack of knowledge in this doesn't help explain, but I tried my best.
The naive approach would be like:
Sub Main()
MethodA()
MethodB()
End Sub
Sub MethodA()
Log("Starting method A")
End Sub
Sub MethodB()
Log("Starting method B")
End Sub
Sub Log(message as String)
System.IO.File.AppendAllText("C:\temp\my.log", message)
End Sub
File.AppendAllText is pretty good at closing things off so you can subsequently write to it elsewhere
A better approach would be to have a class whose job it is to build this file, and it builds it all into a stringbuilder and then writes it once. Multiple of your methods use that class, build that file... The class can either implement some timed/periodic dumping of data to disk (if it's like logging, never ending, thousands of events per second.. but then perhaps you'd just use a logging framework rather than reinvent the wheel), or it has a write method that saves the rendering of it to disk
If there is another specialized application of your data at work here, for example if you're generating XML or JSON you should look at specific serialization approaches for those (wheels that have already been invented)
I use
FileOpen(1, "file.txt", OpenMode.Append)
Now you can write from any other subroutine
PrintLine(1, "text to write")
Until the file is closed
FileClose(1)
But maybe you could solve your problem this way:
Define CurrentHisWriter outside of subroutine as
Private CurrentHisWriter As System.IO.StreamWriter = ....
Then you won't have to close and reopen the writer, all your Subs and functions will have access to it.

Will Putting This Onto A background Worker Stop This Issue

I have been trying to fix this for a number of days now without any success. I know I have created another post related to this issue but not sure if I should have continued with the other post rather than creating a new one as I am still quite new to how SO works so apologies if I have gone about this the wrong way.
Objective
Read a text file from disk, output the contents to a Textbox so I can then extract the last 3 lines from it. This is the only way I can think of doing this.
The text file is continuously been updated by another running program but I can still read it even though it is in use but cannot write to it.
I am probing this file through a Timer which ticks every 1 second in order to get the latest information.
Now to the issue...
I have noticed that after some time my app becomes sluggish which is noticeable when I try to move it across the screen or resize it and the CPU usage starts to creep up to over 33%
My Thought Process
As this reading the file is a continuous one, I was thinking that I could move it onto a BackgroundWorker which from my understanding would put it on a different thread and take some load off the main GUI.
Am I barking up the wrong tree on this one?
I am reaching out to more advanced users before I start to get all the text books out on learning how to use the BackgroundWorker.
Here is the code I am using to Read the txt file and output it to a text Box. I have not included the code for extracting the last 3 lines because I don't think that part is causing the issue.
I think the issue is because I am constantly probing the source files every second with a timer but not 100% sure to be honest.
Dim strLogFilePath As String
strLogFilePath = "C:\DSD\data.txt"
Dim LogFileStream As FileStream
Dim LogFileReader As StreamReader
'Open file for reading
LogFileStream = New FileStream(strLogFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
LogFileReader = New StreamReader(LogFileStream)
'populate text box with the contents of the txt file
Dim strRowText As String
strRowText = LogFileReader.ReadToEnd()
TextBox1.text = strRowText
'Clean Up
LogFileReader.Close()
LogFileStream.Close()
LogFileReader.Dispose()
LogFileStream.Dispose()
Firstly, you should use the Using keyword instead of manually disposing objects, because that way you are guaranteed that the object will get disposed, even if an unexpected exception occurs, for example:
' You can initialize variables in one line
Dim strLogFilePath As String = "C:\DSD\data.txt"
Using LogFileStream As New FileStream(strLogFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
' Everything goes in here
End Using
You don't need the reader for my solution. The reading will be done manually.
Next, you need to read the last n lines (in your case, 3) of the stream. Reading the entire file when you're only interested in a few lines at the end is inefficient. Instead, you can start reading from the end until you've reached three (or any number of) line seprators (based on this answer):
Function ReadLastLines(
NumberOfLines As Integer, Encoding As System.Text.Encoding, FS As FileStream,
Optional LineSeparator As String = vbCrLf
) As String
Dim NewLineSize As Integer = Encoding.GetByteCount(LineSeparator)
Dim NewLineCount As Integer = 0
Dim EndPosition As Long = Convert.ToInt64(FS.Length / NewLineSize)
Dim NewLineBytes As Byte() = Encoding.GetBytes(LineSeparator)
Dim Buffer As Byte() = Encoding.GetBytes(LineSeparator)
For Position As Long = NewLineSize To EndPosition Step NewLineSize
FS.Seek(-Position, SeekOrigin.End)
FS.Read(Buffer, 0, Buffer.Length)
If Encoding.GetString(Buffer) = LineSeparator Then
NewLineCount += 1
If NewLineCount = NumberOfLines Then
Dim ReturnBuffer(CInt(FS.Length - FS.Position)) As Byte
FS.Read(ReturnBuffer, 0, ReturnBuffer.Length)
Return Encoding.GetString(ReturnBuffer)
End If
End If
Next
' Handle case where number of lines in file is less than NumberOfLines
FS.Seek(0, SeekOrigin.Begin)
Buffer = New Byte(CInt(FS.Length)) {}
FS.Read(Buffer, 0, Buffer.Length)
Return Encoding.GetString(Buffer)
End Function
Usage:
Using LogFileStream As New FileStream(strLogFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
' Depending on system, you may need to supply an argument for the LineSeparator param
Dim LastThreeLines As String = ReadLastLines(3, System.Text.Encoding.UTF8, LogFileStream)
' Do something with the last three lines
MsgBox(LastThreeLines)
End Using
Note that I haven't tested this code, and I'm sure it can be improved. It may also not work for all encodings, but it sounds like it should be better than your current solution, and that it will work in your situation.
Edit: Also, to answer your question, IO operations should usually be performed asynchronously to avoid blocking the UI. You can do this using tasks or a BackgroundWorker. It probably won't make it faster, but it will make your application more responsive. It's best to indicate that something is loading before the task begins.
If you know when your file is being written to, you can set a flag to start reading, and then unset it when the last lines have been read. If it hasn't changed, there's no reason to keep reading it over and over.

IsNothing showing error VB.NET VS2017 Community

My first post here and really basic question as I have just started learning.
I am following a step by step tutorial from youtube. https://www.youtube.com/watch?v=6utWyl8agDY code is working alright in the video. below is the code:
Imports System
Imports System.IO
Module Program
Sub Main()
Dim myReader As StreamReader = New StreamReader("values.txt")
Dim line As String = ""
While Not IsNothing(line)
line = myReader.ReadLine()
If Not IsNothing(line) Then
Console.WriteLine(line)
End If
End While
myReader.Close()
Console.ReadLine()
End Sub
End Module
Problem I am facing is that I have red error (squiggly) line under IsNothing. One thing to note is the in the video tutorial version of VS under use is 2013 while I am using VS2017 community. any idea what I am doing wrong here?
I'm not sure what your tutorial says but there are issues with that code. It's fairly poorly structured and, to be honest, you shouldn't really use IsNothing anyway. The most immediate issue is where you're reading the text and that you're testing for Nothing twice. The code would be better written like this:
Dim myReader As New StreamReader("values.txt")
Dim line As String = myReader.ReadLine()
While line IsNot Nothing
Console.WriteLine(line)
line = myReader.ReadLine()
End While
myReader.Close()
Console.ReadLine()
Comparing directly to Nothing using Is or IsNot is the way to go and there's also no need for that check inside the loop.
If you want to get a bit more advanced, you can also create the StreamReader with a Using statement, so it gets closed implicitly:
Using myReader As New StreamReader("values.txt")
Dim line As String = myReader.ReadLine()
While line IsNot Nothing
Console.WriteLine(line)
line = myReader.ReadLine()
End While
End Using
Console.ReadLine()
Better still would be to not create the StreamReader yourself at all but let the Framework do that for you:
For Each line As String In File.ReadLines("values.txt")
Console.WriteLine(line)
End Using
Console.ReadLine()
Note that that code uses ReadLines rather than ReadAllLines. Both would work but the former will only read one line at a time while the latter will read the whole file first, load the lines into an array and return that. In this case it probably doesn't really matter which you use but ReadLines is generally preferable unless you specifically need all the lines first or you want to loop over them multiple times.
You have to decide whether your console application requires .net Core or .net Framework is sufficient.
As others have written in responses, your code is in need of improvement.
However, your code will run correctly if you choose the .net framework for your new project in Visual Studio 2017 Community as shown in the screenshot below.

VB.NET (2013) - Check string against huge file

I have a text file that is 125Mb in size, it contains 2.2 million records. I have another text file which doesn't match the original but I need to find out where it differs. Normally, with a smaller file I would read each line and process it in some way, or read the whole file into a string and do likewise, however the two files are too big for that and so I would like to create something to achieve my goal. Here's what I currently have.. excuse the mess of it.
Private Sub refUpdateBtn_Click(sender As Object, e As EventArgs) Handles refUpdateBtn.Click
Dim refOrig As String = refOriginalText.Text 'Original Reference File
Dim refLatest As String = refLatestText.Text 'Latest Reference
Dim srOriginal As StreamReader = New StreamReader(refOrig) 'start stream of original file
Dim srLatest As StreamReader = New StreamReader(refLatest) 'start stream of latest file
Dim recOrig, recLatest, baseDIR, parentDIR, recOutFile As String
baseDIR = vb.Left(refOrig, InStrRev(refOrig, ".ref") - 1) 'find parent folder
parentDIR = Path.GetDirectoryName(baseDIR) & "\"
recOutFile = parentDIR & "Updated.ref"
Me.Text = "Processing Reference File..." 'update the application
Update()
If Not File.Exists(recOutFile) Then
FileOpen(55, recOutFile, OpenMode.Append)
FileClose(55)
End If
Dim x As Integer = 0
Do While srLatest.Peek() > -1
Application.DoEvents()
recLatest = srLatest.ReadLine
recOrig = srOriginal.ReadLine ' check the original reference file
Do
If Not recLatest.Equals(recOrig) Then
recOrig = srOriginal.ReadLine
Else
FileOpen(55, recOutFile, OpenMode.Append)
Print(55, recLatest & Environment.NewLine)
FileClose(55)
x += 1
count.Text = "Record No: " & x
count.Refresh()
srOriginal.BaseStream.Seek(0, SeekOrigin.Begin)
GoTo 1
End If
Loop
1:
Loop
srLatest.Close()
srOriginal.Close()
FileClose(55)
End Sub
It's got poor programming and scary loops, but that's because I'm not a professional coder, just a guy trying to make his life easier.
Currently, this uses a form to insert the original file and the latest file and outputs each line that matches into a new file. This is less than perfect, but I don't know how to cope with the large file sizes as streamreader.readtoend crashes the program. I also don't need the output to be a copy of the latest input, but I don't know how to only output the records it doesn't find. Here's a sample of the records each file has:
doc:ARCHIVE.346CCBD3B06711E0B40E00163505A2EF
doc:ARCHIVE.346CE683B29811E0A06200163505A2EF
doc:ARCHIVE.346CEB15A91711E09E8900163505A2EF
doc:ARCHIVE.346CEC6AAA6411E0BEBB00163505A2EF
The program I have currently works... to a fashion, however I know there are better ways of doing it and I'm sure much better ways of using the CPU and memory, but I don't know this level of programming. All I would like is for you to take a look and offer your best answers to all or some of the code. Tell me what you think will make it better, what will help with one line, or all of it. I have no time limit on this because the code works, albeit slowly, I would just like someone to tell me where my code could be better and what I could do to get round the huge file sizes.
Your code is slow because it is doing a lot of file IO. You're on the right track by reading one line at a time, but this can be improved.
Firstly, I've created some test files based off the data that you provided. Those files contain three million lines and are about 130 MB in size (2.2 million records was less than 100 MB so I've increased the number of lines to get to the file size that you state).
Reading the entire file into a single string uses up about 600 MB of memory. Do this with two files (which I assume you were doing) and you have over 1GB of memory used, which may have been causing the crash (you don't say what error was shown, if any, when the crash occurred, so I can only assume that it was an OutOfMemoryException).
Here's a few tips before I go through your code:
Use Using Blocks
This won't help with performance, but it does make your code cleaner and easier to read.
Whenever you're dealing with a file (or anything that implements the IDisposable interface), it's always a good idea to use a Using statement. This will automatically dispose of the file (which closes the file), even if an error happens.
Don't use FileOpen
The FileOpen method is outdated (and even stated as being slow in its documentation). There are better alternatives that you are already (almost) using: StreamWriter (the cousin of StreamReader).
Opening and closing a file two million times (like you are doing inside your loop) won't be fast. This can be improved by opening the file once outside the loop.
DoEvents() is evil!
DoEvents is a legacy method from back in the VB6 days, and it's something that you really want to avoid, especially when you're calling it two million times in a loop!
The alternative is to perform all of your file processing on a separate thread so that your UI is still responsive.
Using a separate thread here is probably overkill, and there are a number of intricacies that you need to be aware of, so I have not used a separate thread in the code below.
So let's look at each part of your code and see what we can improve.
Creating the output file
You're almost right here, but you're doing some things that you don't need to do. GetDirectoryName works with file names, so there's no need to remove the extension from the original file name first. You can also use the Path.Combine method to combine a directory and file name.
recOutFile = Path.Combine(Path.GetDirectoryName(refOrig), "Updated.ref")
Reading the files
Since you're looping through each line in the "latest" file and finding a match in the "original" file, you can continue to read one line at a time from the "latest" file.
But instead of reading a line at a time from the "original" file, then seeking back to the start when you find a match, you will be better off reading all of those lines into memory.
Now, instead of reading the entire file into memory (which took up 600 MB as I mentioned earlier), you can read each line of the file into an array. This will use up less memory, and is quite easy to do thanks to the File class.
originalLines = File.ReadAllLines(refOrig)
This reads all of the lines from the file and returns a String array. Searching through this array for matches will be slow, so instead of reading into an array, we can read into a HashSet(Of String). This will use up a bit more memory, but it will be much faster to seach through.
originalLines = New HashSet(Of String)(File.ReadAllLines(refOrig))
Searching for matches
Since we now have all of the lines from the "original" line in an array or HashSet, searching for a line is very easy.
originalLines.Contains(recLatest)
Putting it all together
So let's put all of this together:
Private Sub refUpdateBtn_Click(sender As Object, e As EventArgs)
Dim refOrig As String
Dim refLatest As String
Dim recOutFile As String
Dim originalLines As HashSet(Of String)
refOrig = refOriginalText.Text 'Original Reference File
refLatest = refLatestText.Text 'Latest Reference
recOutFile = Path.Combine(Path.GetDirectoryName(refOrig), "Updated.ref")
Me.Text = "Processing Reference File..." 'update the application
Update()
originalLines = New HashSet(Of String)(File.ReadAllLines(refOrig))
Using latest As New StreamReader(refLatest),
updated As New StreamWriter(recOutFile, True)
Do
Dim line As String
line = latest.ReadLine()
' ReadLine returns Nothing when it reaches the end of the file.
If line Is Nothing Then
Exit Do
End If
If originalLines.Contains(line) Then
updated.WriteLine(line)
End If
Loop
End Using
End Sub
This uses around 400 MB of memory and takes about 4 seconds to run.

VB can't string to arraylist when read from file

Either I'm missing something really obvious or something about vb is really messed up. I'm trying to read in from a file and add the lines to an arraylist... pretty simple If I add strings to the arraylist this way
selectOptions.Add("Standard")
selectOptions.Add("Priority")
selectOptions.Add("3-Day")
selectOptions.Add("Overnight")
I have no problems
But when I do this it appears to end up empty which makes no sense to me.
Dim reader As StreamReader = My.Computer.FileSystem.OpenTextFileReader(path)
Dim line As String
Do
line = reader.ReadLine
selectOptions.Add(line)
Loop Until line Is Nothing
reader.Close()
Messagebox.show line all day so I know it is reading the file and the file isn't empty and I have checked the type of line which comes back as string. This makes no sense to me.
Checking for reader.EndOfStream in a While loop will probably work better:
Dim reader As New StreamReader(path)
Dim line As String
While Not reader.EndOfStream
line = reader.ReadLine
selectOptions.Add(line)
End While
reader.Close()
You can also get an exception if selectOptions isn't declared as a New ArrayList, if you properly have all your Options turned On.
Another thing to remember, if your code is in the form's Load Handler, it won't throw an exception it will just break out of the handler routine and load the form. This makes it really hard to find things like bad file names, badly declared objects, etc.
One thing I do is put suspect code in a button's Click handler and see what exceptions it throws there.
Of course this could all be moot if you use the File.ReadAllLines method and add it directly to the ArrayList:
selectOptions.AddRange(File.ReadAllLines(path))