Loading big text files

Loading big text files - vb.net

I have to load a textfile into my software, which can be really big (at least 1.5 GB), because I need to read the last line of this file to enumerate some elements with the next part of the script. The needed time can be very long but, sometimes, it is not possible to read the file because of the following error:
System.OutOfMemoryException: 'Array dimensions exceeded supported range.'
Is there a way to solve this issue? Or maybe a different - and better - path that I can follow to do what I need?
EDIT I:
Here follows more details:
I'm generating the aforementioned textfile from a batch-script which is run from the software of mine by pressing a button
Since I need to read a number contained in the last row of the generated textfile, I'm loading the file into the software, the pressure of a button is needed and for the relative Sub I'm using the following command:
Dim path as string
path = "C:\textfile.txt"
RichTextBox1.LoadFile(path, RichTextBoxStreamType.PlainText)

If you only need the last line of the file...
Dim LastLine = File.ReadLines("file.txt").Last()

Related

VB.NET open a Text File within the same folder as my Forms without writing Full Path

I found a similar question but it was 5 years 8 months old, had 2 replies and neither worked for me (VB.Net Read txt file from current directory)
My issue is that when I use the following code:
Dim fileReader As String
fileReader = My.Computer.FileSystem.ReadAllText(Application.StartupPath & "\Username_And_Password_Raw.txt")
Dim usernameAndPassword = Split(fileReader, ",")
I get an error saying:
System.IO.FileNotFoundException: 'Could not find file 'C:\Users\wubsy\source\repos\NEA Stock Page System\NEA Stock Page System\bin\Debug\net6.0-windows\Username_And_Password_Raw.txt'.'
I have tried using all the different Applications.BLANKPath options I can find (ie; StartupPath, CommonAppDataPath, etc.) and they all return essentially the same error only with a different location.
This is the folder layout of my TXT File - I know it's a terrible, incredibly insecure and generally awful way of storing login information but this is just for a NEA so will never ever actually be used
This is the actual path of the TXT File if it helps
C:\Users\wubsy\source\repos\NEA Stock Page System\NEA Stock Page System\Username_And_Password_Raw.txt

The startup path is where your exe is located. That and all supporting files get copied to a binary directory when you compile in visual studio, in your case
C:\Users\wubsy\source\repos\NEA Stock Page System\NEA Stock Page System\bin\Debug\net6.0-windows
But what you're trying to do, reference the file where it sits in your solution, is probably not the best way to do it, and your code above will work (with a change, will mention later) if you change the properties of the file in the solution.
Right click on the file in the Solution Explorer Username_And_Password_Raw.txt, select Properties. Modify Copy to Output Directory to either Copy always / Copy if newer, depending on your requirement. Now that file will copy to the same directory your exe is in, and the code above should work.
Note, when creating a path, don't use string concatenation because you may have too many or too few \; use Path.Combine:
Dim filePath = Path.Combine(Application.StartupPath, "Username_And_Password_Raw.txt"
Dim fileContents = My.Computer.FileSystem.ReadAllText(filePath)

Loading Large file into String Variable VB.NET

Question, if anyone could help please: The file I am reading from inPath is very large 300MB to 1 GB +. I need to load the file into the variable wholeFile as shown in the below program. Approximately 200 MB files works fine but larger files bomb out (Out of Memory Exception Error). The purpose is once file is loaded into the variable, I would need to run RegEx and pick certain section of the file and save somewhere else. Thanks once again for your kind attention.
Dim inPath As String = "C:\temp\300MB-File.txt"
Dim outPath As String = "C:\temp\myFileNew2.txt"
Dim wholeFile as String = ""
Using sw As StreamWriter = File.CreateText(outPath)
For Each oneLine As String In File.ReadLines(inPath)
sw.WriteLine(oneLine)
wholeFile = wholeFile & vbCrLf & oneLine
Next
End Using

The way you're doing that is abominable. Why would you read a file line by line if your purpose is to store the entire contents in a single variable? Why wouldn't you load the whole file in one go?
Dim fileContents = File.ReadAllText(filePath)
That may still have memory issues with large files but the way you're doing will use exponentially more memory. Each time you do that concatenation to the String, you create a new String object and copy the previous contents into it along with the new text. That means that, for a file with N lines, you are going to create N Strings. The first will contain the first line, then the second will contain the first two lines, then the third will contain the first three lines, etc, etc.
If you really want to read the file line by line then you could use a StringBuilder, which avoids so much memory reallocation. Even better would be to get the size of the file first and then create the StringBuilder with the appropriate capacity from the get go, so no reallocation would be needed at all.
When you get right down to it though, files of that size are going to be an issue no matter what. You will either need to ensure that enough memory is allocated to your app to handle it or else you'll have to break the file up into chunks and process each chunk separately. If your regex won't match very large portions of the file then you can simply make each chunk overlap by a line or two and then handle the special cases where you get duplicate matches in the overlapping section.

Either correct VB6/DOS/SQL function or suggest better alternative

I am currently reprogramming code made long ago by a less than skilled programmer. Granted the code works, and has for a number of years but it is not very efficient.
The code in question is in VB6 with SQL calls and it checks a particular directory on the drive (in this example we will use c:\files) and if a file exists, it moves the file to the processing directory loads the parameters for that particular file and processes them accordingly.
Currently the code uses the DIR function in VB6 to identify a file in the appropriate directory. The only problem is that if a number of files exist in the directory it is a crap shoot as to if it will grab the 5kb file and process it in 3 seconds or if it will grab the 500,000kb file and not process any others for the next 10 minutes.
I search many message boards to find some way to have it pick the smallest file and found I could build a complicated array to perform something similar to a sort but I decided to try alternate ideas instead to hopefully reduce processing time involved. Using ancient DOS knowledge I created something that should work, but for some reason is not (hence posting here).
I made a batch file that we will call c:\test.bat which contained the following lines:
delete c:\test.txt
dir /OS /B c:\files\*.txt>c:\test.txt
This deletes a prior existence of test.txt the pipes a directory without headers sorted by file size smallest to largest into c:\test.txt.
I then inserted the following code into the pre-existing code at the beginning:
Shell "c:\test.bat", vbHide
filepath = "c:\test.txt"
Open filepath For Input As #1
Input #1, filegrabber
Close #1
When I step through the code I can see that this works correctly, except now later on in the code I get a
Runtime error 91 Object variable or with block variable not set
in regard to assigning a FileSystemObject. Am I correct in guessing that FSO and Shell do not work well together? Also if you can suggest a better alternative to getting the smallest file from a existing directory suggestions are appreciated.

No need for sorting.
Just use Dir() to cruise through the directory. Before the loop set a Smallest Long variable to &H7FFFFFFF then inside the loop test each returned file name using the FileLen() function.
If FileLen() returns a value less than Smallest assign that size to Smallest and assign that file name to SmallestFile a String variable.
Upon loop exit if Smallest = &H7FFFFFFF there were no files, otherwise SmallestFile has your file name.
Seems incredibly simple, what am I missing?

Another approach is to use the FileSystemObject's Files collection. Just iterate the files collection for a given folder and evaluate each File object's Size property. So long as you don't have a million files in a folder or something, performance should be fine.

Using ReadLine, where did my text go?

I'm pretty new to visual basic (and coding in general) so if I've made any really simple mistakes let me know.
Right now, I'm getting a pretty weird problem with my vb.net code.
The filestream is able to correctly open the file and read from it - but what's weird is that while the code is able to read a bunch of lines from the beginning of the file, when I manually open the file in notepad I'm not. Here's the code:
Dim fs, f, s 'filesystemobject, file, stream.
fs = CreateObject("Scripting.FileSystemObject")
f = fs.GetFile(CurrDataPath) ' This change made to ensure the correct file is opened
s = f.OpenAsTextStream(1, 0) ' 1 = ForReading, 0 = as ASCII (which i think is right?)
Dim param(14) As String
Dim line As String
line = s.ReadLine()
While i <= 14
i += 1
MessageBox.Show(line)
line = s.ReadLine()
End While
(I've read that arrays are a bad idea but they've been convenient and haven't caused me any problems so I've been using them anyways.)
What's weird is that when this code is run, it will (in the message boxes) show me the information I want to see - which isn't bad at all. The information that I want looks like this:
BEGINPARAM
parameter1, 0
parameter2, 7.5
ENDPARAM
EDIT:
After using Path.GetFullPath(DFile), I found that there were two files in different directories with the same name DFile. The file I had been opening in Notepad was saved in the directory where I expected it to be saved, while the file the code was reading was saved in the VB project's folder.
Once I changed the code to rely on CurrDataPath which includes the expected path, the code read from the file exactly what I did in notepad.
I do have word wrap on in notepad, so I know that's not the issue, however, I will look into getting notepad++.
The file named DFile is created in a c++ program that I'll be digging through to find out why one part of the file is written to a different folder than the rest.
Obviously I'm missing something important, and if anyone could help, that would be great.
*Note: This is a vb6 migration project so if anyone asks I can provide the old code.

Assuming the most recent version of VB.Net, the modern way to write that is like this:
For Each line As String In File.ReadLines(CurrDataPath).Take(14)
MessageBox.Show(line)
Next

I'm not 100% clear on what you're saying. There's nothing in this code that outputs to a file, so what you have to be saying is that when you open the file referenced by "DFile" on line 3 above, that file doesn't have the lines containing "parameter1, 0" and "parameter2, 7.5" in it?
Since we know that's not technically possible, do verify the answer to the question above and make sure you're really opening the same file in notepad as the script is opening. The second thing to do is to turn on Word Wrap in Notepad or download Notepad++ (a text editor I think everyone should have anyway) and make sure that the data's actually missing, and not just not showing on your screen because it's not using Windows style line endings.

Finding a line in a text file via an integer

I am trying to make some code that will generate a random number and then check numbers on each line in a text file to see if has already been generated. I have everything but code that will check for the number generated in the text file. Any ideas?
Here is the code I have so far:
Dim Rlo As New IO.StreamReader("C:\Users\Somebody\Documents\Visual Studio 2012\Projects\RobloxRecruitV1\RobloxRecruitV1\bin\Debug\" & TheFileName.Text & ".txt")
Dim firstLine As String
'read first line
firstLine = Rlo.ReadLine()
'read secondline
TheText.Text = Rlo.ReadLine()
rndnumber = New Random
number = rndnumber.Next(firstLine, TheText.Text)
TextBox1.Text = number.ToString

I can't give you the exact code (It's been a long time since I did anything in VB6...)
but....
I can tell you that using a stream reader is the wrong approach.
A stream reader is exactly what it's name suggests. A constant stream of data, it starts and then stops when it reaches an end.
Now while it's true that you can to a small extent seek back and forth in a stream, that's not really what you need in this case.
What you need is to load all the lines of your file into an in memory array or some kind of hash table, then your task simply becomes one of looking to see if a given index exists.
If you have no choice but to use the file as is on disk (Due to size restrictions for example) then the approach you need is this:
1) Open the file
2) Set you position to the beginning
3) enter a loop reading sequential lines
4) once you have the line that corresponds to the count your looking for close the file and end
5) loop back round until no more lines left
6) close the file
opening and closing, then resetting each time is important, this is so that you KNOW EXACTLY where in the file your starting from each time, you could in theory keep the file open and just reset the position, but that in my mind could be dangerous esp if you have other processes writing to it.
If your file is not very big, then I'd opt for an in memory approach, load the file, perform operations on the in memory array of lines, then save it before exit.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas