How to avoid "Out Of Memory" exception when reading large files using File.ReadAllText(x) - vb.net

This is my code to search for a string for all files and folders in the drive "G:\" that contains the string "hello":
Dim path = "g:\"
Dim fileList As New List(Of String)
GetAllAccessibleFiles(path, fileList)
'Convert List<T> to string array if you want
Dim files As String() = fileList.ToArray
For Each s As String In fileList
Dim text As String = File.ReadAllText(s)
Dim index As Integer = text.IndexOf("hello")
If index >= 0 Then
MsgBox("FOUND!")
' String is in file, starting at character "index"
End If
Next
This code will also results in memory leak/out of memory exception (as I read file as big as 5GB!). Perhaps it will bring the whole file to the RAM, then went for the string check.
Dim text As String = File.ReadAllText("C:\Users\Farizluqman\mybigmovie.mp4")
' this file sized as big as 5GB!
Dim index As Integer = text.IndexOf("hello")
If index >= 0 Then
MsgBox("FOUND!")
' String is in file, starting at character "index"
End If
But, the problem is: This code is really DANGEROUS, that may lead to memory leak or using 100% of the RAM. The question is, is there any way or workaround for the code above? Maybe chunking or reading part of the file and then dispose to avoid memory leak/out of memory? Or is there any way to minimize the memory usage when using the code? As I felt responsible for other's computer stability. Please Help :)

You should use System.IO.StreamReader, which reads line by line instead all the lines at the same time (here you have a similar post in C#); I personally never use ReadAll*** unless under very specific conditions. Sample adaptation of your code:
Dim index As Integer = -1
Dim lineCount As Integer = -1
Using reader As System.IO.StreamReader = New System.IO.StreamReader("C:\Users\Farizluqman\mybigmovie.mp4")
Dim line As String
line = reader.ReadLine
If (line IsNot Nothing AndAlso line.Contains("hello")) Then
index = line.IndexOf("hello")
Else
If (line IsNot Nothing) Then lineCount = line.Length
Do While (Not line Is Nothing)
line = reader.ReadLine
If (line IsNot Nothing) Then
lineCount = lineCount + line.Length
If (line.Contains("hello")) Then
index = lineCount - line.Length + line.IndexOf("hello")
Exit Do
End If
End If
Loop
End If
End Using
If index >= 0 Then
MsgBox("FOUND!")
' String is in file, starting at character "index"
End If

Related

Text file split in blocks vb.net

I am trying to go through my text file and create a new file that will contain only the text I require. My current line looks like:
Car-1I
Colour-39
Cost-328
Dealer-28
Car-2
Colour-30
Cost-234
For each block of text I would like to read the first line, if the first line ends with an I, then read the next line, if that line contains a colour 39, then I would like to save the whole block of text to another file. If these two conditions aren't met, I dont want to save my values to the new text file.
Before anything about saving my values in classes are mentioned, these blocks of text can vary in size and values, so I dont always have a set range of values which is why i need to skip to the blank line
IO.File.WriteAllText("C:\Users\test2.txt", "") 'write to new file
Dim sKey As String
Dim sValue As Integer
For Each filterLine As String In File.ReadLines("C:\Users\test.txt")
sKey = Split(filterLine, ":")(0)
sValue = Split(filterLine, ":")(1)
If Not sValue.EndsWith("I") Then
ElseIf sValue.EndsWith("I") Then
End If
Next
Another method, using File.ReadLines to read lines of text from file. This method doesn't load all the text in memory, it reads from disc single lines of text, so it can also be useful when dealing with big files.
You could loop the IEnumerable collection it returns, but also use its GetEnumerator() method to control more directly when to move to the next line, or move more then one lines forward.
Its Enumerator.Current object returns the line of text currently read, Enumerator.MoveNext() moves to the next line.
A StringBuilder is used to store the strings when a match found. Strings are added to the StringBuilder object using its AppendLine() method.
This class is useful when dealing with strings that you need to store, compare and discard (or modify) quickly: since string are immutable, when you use String variables directly, especially in loops, you generate a whole lot of garbage that slows down any procedure quite a lot.
The blocks of text stored in the StringBuilder object are then written to a destination file using a StreamWriter with explicit encoding set to UTF-8 (writes the BOM). Its methods include asynchronous versions: WriteLine() can be replaced by awaitWriteLineAsync() to allow an async procedure.
Imports System.IO
Imports System.Text
Dim sourceFilePath = "<Path of the source file>"
Dim resultsFilePath = "<Path of the destination file>"
Dim sb As New StringBuilder()
Dim enumerator = File.ReadLines(sourceFilePath).GetEnumerator()
Using sWriter As New StreamWriter(resultsFilePath, False, Encoding.UTF8)
While enumerator.MoveNext()
If enumerator.Current.EndsWith("I") Then
sb.AppendLine(enumerator.Current)
enumerator.MoveNext()
If enumerator.Current.EndsWith("39") Then
While Not String.IsNullOrWhiteSpace(enumerator.Current)
sb.AppendLine(enumerator.Current)
enumerator.MoveNext()
End While
sWriter.WriteLine(sb.ToString())
End If
sb.Clear()
End If
End While
End Using
This will work:
Dim strFile As String = "c:\Test5\Source.txt"
Dim strOutFile As String = "c:\Test5\OutPut.txt"
Dim strOutData As String = ""
Dim SourceGroups As String() = Split(File.ReadAllText(strFile), vbCrLf + vbCrLf)
For Each sGroup As String In SourceGroups
Dim OneGroup() As String = Split(sGroup, vbCrLf)
If Strings.Right(OneGroup(0), 1) = "I" And (Strings.Right(OneGroup(1), 2) = "39") Then
If strOutData <> "" Then strOutData += (vbCrLf & vbCrLf)
strOutData += sGroup
End If
Next
File.WriteAllText(strOutFile, strOutData)
Something like this should work:
Dim base, i, c as Integer
Dim lines1$() = File.ReadLines("C:\Users\test.txt")
c = lines1.count
While i < c
if Len(RTrim(lines1(i))) Then
If Strings.Right(RTrim(lines1(i)), 1)="I" Then
base = i
i += 1
If Strings.Right(RTrim(lines1(i)), 2)="39" Then
While Len(RTrim(lines1(i))) 'skip to the next blank
i += 1
End While
' write lines1(from base to (i-1)) here
Else
While Len(RTrim(lines1(i)))
i += 1
End While
End If
Else
i += 1
End If
Else
i += 1
End If
End While

Count words in an external file using delimiter of a space

I want to calculate the number of words in a text file using a delimiter of a space (" "), however I am struggling.
Dim counter = 0
Dim delim = " "
Dim fields() As String
fields = Nothing
Dim line As String
line = Input
While (SR.EndOfStream)
line = SR.ReadLine()
End While
Console.WriteLine(vbLf & "Reading File.. ")
fields = line.Split(delim.ToCharArray())
For i = 0 To fields.Length
counter = counter + 1
Next
SR.Close()
Console.WriteLine(vbLf & "The word count is {0}", counter)
I do not know how to open the file and to get the do this, very confused; would like an explanation so I can edit and understand from it.
You're going to be reading a file as the source of the data, so let's create a variable to refer to its filename:
Dim srcFile = "C:\temp\twolines.txt"
As you have shown already, a variable is needed to hold the number of words found:
Dim counter = 0
To read from the file, a StreamReader will do the job. Now, we look at the documenation for it (yes, really) and notice that it has a Dispose method. That means that we have to explicitly dispose of it after we've used it to make sure that no system resources are tied up until the computer is next rebooted (e.g there could be a "memory leak"). Fortunately, there is the Using construct to take care of that for us:
Using sr As New StreamReader(srcFile)
And now we want to iterate over the content of the file line-by-line until the end of the file:
While Not sr.EndOfStream
Then we want to read a line and find how many items separated by spaces it has:
counter += sr.ReadLine().Split({" "c}, StringSplitOptions.RemoveEmptyEntries).Length
The += operator is like saying "add n to a" instead of saying "a = a + n". The {" "c} is a literal array of the character " "c. The c tells it that is a character and not a string of one character. The StringSplitOptions.RemoveEmptyEntries means that if there was text of "one two" then it would ignore the extra spaces.
So, if you were writing a console program, it might look like:
Imports System.IO
Module Module1
Sub Main()
Dim srcFile = "C:\temp\twolines.txt"
Dim counter = 0
Using sr As New StreamReader(srcFile)
While Not sr.EndOfStream
counter += sr.ReadLine().Split({" "c}, StringSplitOptions.RemoveEmptyEntries).Length
End While
End Using
Console.WriteLine(counter)
Console.ReadLine()
End Sub
End Module
Any embellishments such as writing out what the number represents or error checking are left up to you.
With Path.Combine you don't have to worry about where the slashes or back slashes go. You can get the path of special folders easily using the Environment class. The File class of System.IO is shared so you don't have to create an instance.
Public Sub Main()
Dim p As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments), "Chapters.txt")
Debug.Print(Environment.SpecialFolder.MyDocuments.ToString)
Dim count As Integer = GetCount(p)
Console.WriteLine(count)
Console.ReadKey()
End Sub
Private Function GetCount(Path As String) As Integer
Dim s = File.ReadAllText(Path)
Return s.Split().Length
End Function
Use Split function, then Directly get the length of result array and add 1 to it.

VB.net How to remove quotes characters from a streamReader.readline.split()

I had built a project that read data from a report and it used to work fine but now for some reason the report puts every thing in to strings. So I want to modify my stream reader to remove or ignore the quotes as it reads the lines.
This is a snipet of the part that reads the lines.
Dim RawEntList As New List(Of Array)()
Dim newline() As String
Dim CurrentAccountName As String
Dim CurrentAccount As Account
Dim AccountNameExsists As Boolean
Dim NewAccount As Account
Dim NewEntry As Entrey
Dim WrongFileErrorTrigger As String
ListOfLoadedAccountNames.Clear()
'opens the file
Try
Dim sr As New IO.StreamReader(File1Loc)
Console.WriteLine("Loading full report please waite")
MsgBox("Loading full report may take a while.")
'While we have not finished reading the file
While (Not sr.EndOfStream)
'spliting eatch line up into an array
newline = sr.ReadLine.Split(","c)
'storring eatch array into a list
RawEntList.Add(newline)
End While
And then of course I iterate through the list to pull out information to populate objects like this:
For Each Entr In RawEntList
CurrentAccountName = Entr(36)
AccountNameExsists = False
For Each AccountName In ListOfLoadedAccountNames
If CurrentAccountName = AccountName Then
AccountNameExsists = True
End If
Next
You could just do
StringName.Replace(ControlChars.Quote, "")
or
StringName.Replace(Chr(34), "")
OR
streamReader.readline.split().Replace(Chr(34), "")
How about doing the replace before the split, after the readline? That should save iteration multiplication, or better yet (if possible), do a replace on the entire file (if the data is formatted in the way it can be done & you have enough memory) using the ReadAllText method of the File object, do your replace, then read the lines from memory to build your array (super fast!).
File.ReadAllText(path)

For some reason my program is not reading the file that I asked to be read

You can see thatt I've opened the file just below, but I've recently discovered that It is reading a file open above, which I have closed.
Dim TestNO As Integer
Dim myLines As New List(Of String)
Dim sb As StringBuilder
FileOpen(10, "F:\Computing\Spelling Bee\testtests.csv", OpenMode.Input)
Dim Item() As String = Split(fullline, ",")
Dim MaxVal As Integer = Integer.MaxValue
Do Until EOF(10)
fullline = LineInput(10)
If Item(7) > MaxVal Then
MaxVal = Item(7)
TestNO = MaxVal
End If
Loop
This is where I open and close my previous file.
Dim flag As Boolean = False
FileOpen(1, "F:\Computing\Spelling Bee\stdnt&staffdtls\stdnt&staffdtls.csv",
OpenMode.Input)
Do Until EOF(1)
fullline = LineInput(1)
Dim item() As String = Split(fullline, ",")
If enteredusername = item(0) And enteredpassword = item(1) Then
Console.WriteLine()
Console.Clear()
Console.WriteLine("Welcome," & item(3) & item(4))
Threading.Thread.Sleep(1000)
Console.Clear()
flag = True
If item(2) = "p" Then
FileClose(1)
pupilmenu()
ElseIf item(2) = "s" Then
FileClose(1)
staffmenu()
ElseIf item(2) = "a" Then
FileClose(1)
adminmenu()
FileOpen, EOF, and LineInput are all old VB6-style methods which are provided primarily for backwards compatibility. It would be far preferable to use the new .NET classes provided in the System.IO namespace. For instance, this same task is easily performed line this:
For Each line As String In File.ReadAllLines("F:\Computing\Spelling Bee\stdnt&staffdtls\stdnt&staffdtls.csv")
Dim fields() As String = line.Split(","c)
If (enteredUserName = fields(0)) And (enteredPassword = fields(1)) Then
' ...
End If
Next
Notice that I also used line.Split rather than Split(line), which is also an old VB6-style method. It's better to use the new String.Split method.
The File.ReadAllLines method opens the file, reads the entire contents of the file, closes the file, and then returns all of the data as an array of strings, with each item in the array being one line from the file. This is a very simple way to read an entire file. If it is a particularly large file, however, it would be better to use a FileStream object to read one line at a time.
Also, it's worth mentioning that reading a CSV file yourself can be complicated. They aren't always as simple as simply splitting on commas. For instance, the following is an example of a valid CSV line:
Bill, "Red, White, and Blue", Smith
As you can see, that line only contains three fields, but it contains four commas. Also, the quotation marks should not be considered as part of the value in the second field. The easiest way to read a CSV file is to use the TextFieldParser class, which handles all of those eccentricities.

linq submitchanges runs out of memory

I have a database with about 180,000 records. I'm trying to attach a pdf file to each of those records. Each pdf is about 250 kb in size. However, after about a minute my program starts taking about about a GB of memory and I have to stop it. I tried doing it so the reference to each linq object is removed once it's updated but that doesn't seem to help. How can I make it clear the reference?
Thanks for your help
Private Sub uploadPDFs(ByVal args() As String)
Dim indexFiles = (From indexFile In dataContext.IndexFiles
Where indexFile.PDFContent = Nothing
Order By indexFile.PDFFolder).ToList
Dim currentDirectory As IO.DirectoryInfo
Dim currentFile As IO.FileInfo
Dim tempIndexFile As IndexFile
While indexFiles.Count > 0
tempIndexFile = indexFiles(0)
indexFiles = indexFiles.Skip(1).ToList
currentDirectory = 'I set the directory that I need
currentFile = 'I get the file that I need
writePDF(currentDirectory, currentFile, tempIndexFile)
End While
End Sub
Private Sub writePDF(ByVal directory As IO.DirectoryInfo, ByVal file As IO.FileInfo, ByVal indexFile As IndexFile)
Dim bytes() As Byte
bytes = getFileStream(file)
indexFile.PDFContent = bytes
dataContext.SubmitChanges()
counter += 1
If counter Mod 10 = 0 Then Console.WriteLine(" saved file " & file.Name & " at " & directory.Name)
End Sub
Private Function getFileStream(ByVal fileInfo As IO.FileInfo) As Byte()
Dim fileStream = fileInfo.OpenRead()
Dim bytesLength As Long = fileStream.Length
Dim bytes(bytesLength) As Byte
fileStream.Read(bytes, 0, bytesLength)
fileStream.Close()
Return bytes
End Function
I suggest you perform this in batches, using Take (before the call to ToList) to process a particular number of items at a time. Read (say) 10, set the PDFContent on all of them, call SubmitChanges, and then start again. (I'm not sure offhand whether you should start with a new DataContext at that point, but it might be cleanest to do so.)
As an aside, your code to read the contents of a file is broken in at least a couple of ways - but it would be simpler just to use File.ReadAllBytes in the first place.
Also, your way of handling the list gradually shrinking is really inefficient - after fetching 180,000 records, you're then building a new list with 179,999 records, then another with 179,998 records etc.
Does the DataContext have ObjectTrackingEnabled set to true (the default value)? If so, then it will try to keep a record of essentially all the data it touches, thus preventing the garbage collector from being able to collect any of it.
If so, you should be able to fix the situation by periodically disposing the DataContext and creating a new one, or turning object tracking off.
OK. To use the smallest amount of memory we have to update the datacontext in blocks. I've put a sample code below. Might have sytax errors since I'm using notepad to type it in.
Dim DB as YourDataContext = new YourDataContext
Dim BlockSize as integer = 25
Dim AllItems = DB.Items.Where(function(i) i.PDFfile.HasValue=False)
Dim count = 0
Dim tmpDB as YourDataContext = new YourDataContext
While (count < AllITems.Count)
Dim _item = tmpDB.Items.Single(function(i) i.recordID=AllItems.Item(count).recordID)
_item.PDF = GetPDF()
Count +=1
if count mod BlockSize = 0 or count = AllItems.Count then
tmpDB.SubmitChanges()
tmpDB = new YourDataContext
GC.Collect()
end if
End While
To Further optimise the speed you can get the recordID's into an array from allitems as an anonymous type, and set DelayLoading on for that PDF field.