Question, if anyone could help please: The file I am reading from inPath is very large 300MB to 1 GB +. I need to load the file into the variable wholeFile as shown in the below program. Approximately 200 MB files works fine but larger files bomb out (Out of Memory Exception Error). The purpose is once file is loaded into the variable, I would need to run RegEx and pick certain section of the file and save somewhere else. Thanks once again for your kind attention.
Dim inPath As String = "C:\temp\300MB-File.txt"
Dim outPath As String = "C:\temp\myFileNew2.txt"
Dim wholeFile as String = ""
Using sw As StreamWriter = File.CreateText(outPath)
For Each oneLine As String In File.ReadLines(inPath)
sw.WriteLine(oneLine)
wholeFile = wholeFile & vbCrLf & oneLine
Next
End Using
The way you're doing that is abominable. Why would you read a file line by line if your purpose is to store the entire contents in a single variable? Why wouldn't you load the whole file in one go?
Dim fileContents = File.ReadAllText(filePath)
That may still have memory issues with large files but the way you're doing will use exponentially more memory. Each time you do that concatenation to the String, you create a new String object and copy the previous contents into it along with the new text. That means that, for a file with N lines, you are going to create N Strings. The first will contain the first line, then the second will contain the first two lines, then the third will contain the first three lines, etc, etc.
If you really want to read the file line by line then you could use a StringBuilder, which avoids so much memory reallocation. Even better would be to get the size of the file first and then create the StringBuilder with the appropriate capacity from the get go, so no reallocation would be needed at all.
When you get right down to it though, files of that size are going to be an issue no matter what. You will either need to ensure that enough memory is allocated to your app to handle it or else you'll have to break the file up into chunks and process each chunk separately. If your regex won't match very large portions of the file then you can simply make each chunk overlap by a line or two and then handle the special cases where you get duplicate matches in the overlapping section.
Related
I have to load a textfile into my software, which can be really big (at least 1.5 GB), because I need to read the last line of this file to enumerate some elements with the next part of the script. The needed time can be very long but, sometimes, it is not possible to read the file because of the following error:
System.OutOfMemoryException: 'Array dimensions exceeded supported range.'
Is there a way to solve this issue? Or maybe a different - and better - path that I can follow to do what I need?
EDIT I:
Here follows more details:
I'm generating the aforementioned textfile from a batch-script which is run from the software of mine by pressing a button
Since I need to read a number contained in the last row of the generated textfile, I'm loading the file into the software, the pressure of a button is needed and for the relative Sub I'm using the following command:
Dim path as string
path = "C:\textfile.txt"
RichTextBox1.LoadFile(path, RichTextBoxStreamType.PlainText)
If you only need the last line of the file...
Dim LastLine = File.ReadLines("file.txt").Last()
I am currently reprogramming code made long ago by a less than skilled programmer. Granted the code works, and has for a number of years but it is not very efficient.
The code in question is in VB6 with SQL calls and it checks a particular directory on the drive (in this example we will use c:\files) and if a file exists, it moves the file to the processing directory loads the parameters for that particular file and processes them accordingly.
Currently the code uses the DIR function in VB6 to identify a file in the appropriate directory. The only problem is that if a number of files exist in the directory it is a crap shoot as to if it will grab the 5kb file and process it in 3 seconds or if it will grab the 500,000kb file and not process any others for the next 10 minutes.
I search many message boards to find some way to have it pick the smallest file and found I could build a complicated array to perform something similar to a sort but I decided to try alternate ideas instead to hopefully reduce processing time involved. Using ancient DOS knowledge I created something that should work, but for some reason is not (hence posting here).
I made a batch file that we will call c:\test.bat which contained the following lines:
delete c:\test.txt
dir /OS /B c:\files\*.txt>c:\test.txt
This deletes a prior existence of test.txt the pipes a directory without headers sorted by file size smallest to largest into c:\test.txt.
I then inserted the following code into the pre-existing code at the beginning:
Shell "c:\test.bat", vbHide
filepath = "c:\test.txt"
Open filepath For Input As #1
Input #1, filegrabber
Close #1
When I step through the code I can see that this works correctly, except now later on in the code I get a
Runtime error 91 Object variable or with block variable not set
in regard to assigning a FileSystemObject. Am I correct in guessing that FSO and Shell do not work well together? Also if you can suggest a better alternative to getting the smallest file from a existing directory suggestions are appreciated.
No need for sorting.
Just use Dir() to cruise through the directory. Before the loop set a Smallest Long variable to &H7FFFFFFF then inside the loop test each returned file name using the FileLen() function.
If FileLen() returns a value less than Smallest assign that size to Smallest and assign that file name to SmallestFile a String variable.
Upon loop exit if Smallest = &H7FFFFFFF there were no files, otherwise SmallestFile has your file name.
Seems incredibly simple, what am I missing?
Another approach is to use the FileSystemObject's Files collection. Just iterate the files collection for a given folder and evaluate each File object's Size property. So long as you don't have a million files in a folder or something, performance should be fine.
I have a whole bunch of text files that are exported from Photoshop that I need to import into an Excel document. I wrote a macro to get the job done and it seemed to work just fine for my test document but when I tried loading in some of the actual files produced by Photoshop Excel started putting all the data in a separate column except for the first line.
My code that reads the text file:
Open currentDocPath For Input As stream
Do Until EOF(stream)
Input #stream, currentLine
columnContents = Split(currentLine, vbTab)
For n = 0 To UBound(columnContents)
ActiveSheet.Cells(row, Chr(64 + colum + n)).Value = columnContents(n)
Next n
row = row + 1
Loop
Close stream
The text files I am reading look like this, only with much more data:
"Name" "Data" "Info" "blah"
"Name1" "Data1" "Info1" "blah1"
"Name2" "Data2" "Info2" "blah2"
The problem seemed pretty trivial, but when I load it into excel, instaed of looking like it does above it looks like this:
ÿþ"Name" "Data" "Info" "blah"
Name1
Data1
Info1
blah1
Name2
Data2
Info2
blah2
Now I am not sure why this is happening. It seems like the first two characters in the first row are there because those bytes declare the text encoding. Somehow those characters keep the first row formatted correctly while the remaining rows lose their quotation marks and all get moved to new lines.
Could someone who understands UCS-2 Little Endian text encoding explain how I can work around this? When I convert the files to ASCII it works fine.
Cheers!
edit: Okay so I understand now that the encoding is UTF-16 (I don't know a whole lot about character encoding). My main issue is that it's formatting strangely and I don't understand why or how to fix it. Thanks!
As I mentioned in my comment, it appears the file you're trying to import is encoded in UTF-16.
In this vbaexpress.com article, someone suggested that the following should work:
Dim GetOpenFile As String
Dim MyData As String
Dim r As Long
GetOpenFile = Application.GetOpenFilename
r = 1
Open GetOpenFile For Input As #1
Do While Not EOF(1)
Line Input #1, MyData
Cells(r, 1).Value = MyData
r = r + 1
Loop
Close #1
Obviously I can't test it myself, but maybe it'll help you.
Why not just tell excel to import the file. MS has probably put hundreds of thousands of person hours into that code. Record the importation to get easy code.
Remember Excel is a tool for non programmers to do programming things. Use it instead of trying to replace it.
These are the replacement file functions that you use for new code. Add a reference to Microsoft Scripting Runtime.
Opens a specified file and returns a TextStream object that can be used to read from, write to, or append to the file.
object.OpenTextFile(filename[, iomode[, create[, format]]])
Arguments
object
Required. Object is always the name of a FileSystemObject.
filename
Required. String expression that identifies the file to open.
iomode
Optional. Can be one of three constants: ForReading, ForWriting, or ForAppending.
create
Optional. Boolean value that indicates whether a new file can be created if the specified filename doesn't exist. The value is True if a new file is created, False if it isn't created. If omitted, a new file isn't created.
format
Optional. One of three Tristate values used to indicate the format of the opened file. If omitted, the file is opened as ASCII.
The format argument can have any of the following settings:
Constant Value Description
TristateUseDefault
-2
Opens the file using the system default.
TristateTrue
-1
Opens the file as Unicode.
TristateFalse
0
Opens the file as ASCII.
I am trying to make some code that will generate a random number and then check numbers on each line in a text file to see if has already been generated. I have everything but code that will check for the number generated in the text file. Any ideas?
Here is the code I have so far:
Dim Rlo As New IO.StreamReader("C:\Users\Somebody\Documents\Visual Studio 2012\Projects\RobloxRecruitV1\RobloxRecruitV1\bin\Debug\" & TheFileName.Text & ".txt")
Dim firstLine As String
'read first line
firstLine = Rlo.ReadLine()
'read secondline
TheText.Text = Rlo.ReadLine()
rndnumber = New Random
number = rndnumber.Next(firstLine, TheText.Text)
TextBox1.Text = number.ToString
I can't give you the exact code (It's been a long time since I did anything in VB6...)
but....
I can tell you that using a stream reader is the wrong approach.
A stream reader is exactly what it's name suggests. A constant stream of data, it starts and then stops when it reaches an end.
Now while it's true that you can to a small extent seek back and forth in a stream, that's not really what you need in this case.
What you need is to load all the lines of your file into an in memory array or some kind of hash table, then your task simply becomes one of looking to see if a given index exists.
If you have no choice but to use the file as is on disk (Due to size restrictions for example) then the approach you need is this:
1) Open the file
2) Set you position to the beginning
3) enter a loop reading sequential lines
4) once you have the line that corresponds to the count your looking for close the file and end
5) loop back round until no more lines left
6) close the file
opening and closing, then resetting each time is important, this is so that you KNOW EXACTLY where in the file your starting from each time, you could in theory keep the file open and just reset the position, but that in my mind could be dangerous esp if you have other processes writing to it.
If your file is not very big, then I'd opt for an in memory approach, load the file, perform operations on the in memory array of lines, then save it before exit.
I've been trying to use StreamReader to read a log file. I cannot verify what it is encoded in, as when I open it in notepad++ and select ANSI encoding, I get this result:
I'm getting the characters needed when using ANSI but they are followed by things like [NULL][EOT][SOH][NUL][SI]
When I try and read the file in VB (using StreamReader or ReadAll) with ANSI encoding selected the resulting string I get back is completely wrong.
How could I read a file like this in VB.net?
You could use the IO.File.ReadAllText("File Location", encoding as System.Text.Encoding) method,
Dim textFromFile as string = IO.File.ReadAllText("C:\Users\Jason\Desktop\login20130417.rdb", System.Text.Encoding.ASCII) 'Or Unicode, UFT32, UFT8, UFT7, BigEndianUnicode or default. Default is ANSI.
If you still don't get the text you need by using the default encoding (ANSI), then you can always try the other 6 different encoding methods.
Update...
It appears that your file is corrupt, using the code below I was able to get a binary representation of whatever is in the file, I got this,
1111111111111101000001110000010000000000000001010000000000010011000000000000100000000000000111100000000000100110000000000011100000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111110100000111000001000000000000000101000000000001001100000000000010000000000000011110000000000010100000000000111111111111110100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110011100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
The massive amount of null data would suggest that the file is corrupt, which would also explain why we are not getting a lot of data whenever we try to read the file.
The code,
Dim fileData As String = IO.File.ReadAllText("C:\Users\Jason\Desktop\login20130417.rdb")
Dim i As Integer = 0
Dim binaryData As String = ""
Dim ch As String = ""
Do Until i = fileData.Length
ch = fileData.Chars(i)
bin = bin & System.Convert.ToString(AscW(ch), 2).PadLeft(8, "0")
i = i + 1
Loop
As #Daniel A. White suggested in his comment, that file does not appear to be encoded like a "normal" text file. A StreamReader will not work in this situation. I would attempt to use a BinaryReader.
Rdb file? Never heard of it. Quick google makes it less clear - n64 database file, Darkbot, etc...
However considering the name you have, and the general look of the opened file, i would say its a binary file.
If you want to read the file in vb.net you'll need a library of sorts, and i can't help you with one until you are able to shed some light on what the file may be, or what it was created with.