Is there a better way to count the lines in a text file? - vb.net

Below is what I've been using. While it does work, my program locks up when trying to count a rather large file, say 10,000 or more lines. Smaller files run in no time.
Is there a better or should I say faster way to count the lines in a text file?
Here's what I'm currently using:
Dim selectedItems = (From i In ListBox1.SelectedItems).ToArray()
For Each selectedItem In selectedItems
ListBox2.Items.Add(selectedItem)
ListBox1.Items.Remove(selectedItem)
Dim FileQty = selectedItem.ToString
'reads the data file and returns the qty
Dim intLines As Integer = 0
'Dim sr As New IO.StreamReader(OpenFileDialog1.FileName)
Dim sr As New IO.StreamReader(TextBox1_Path.Text + "\" + FileQty)
Do While sr.Peek() >= 0
TextBox1.Text += sr.ReadLine() & ControlChars.CrLf
intLines += 1
Loop
ListBox6.Items.Add(intLines)
Next

Imports System.IO.File 'At the beginning of the file
Dim lineCount = File.ReadAllLines("file.txt").Length
See this question.

Even if you make your iteration as efficient as can be, if you hand it a large enough file you're going to make the application freeze while it performs the work.
If you want to avoid the locking, you could spawn a new thread and perform the work asynchronously. If you're using .NET 4.0 you can use the Task class to make this very easy.

TextBox2.Text = File.ReadAllLines(scannerfilePath).Length.ToString()

Related

Search text file for a ranged value

I want to read and write the same file with StreamReader and StreamWriter. I know that in my code I am trying to open the file twice and that is the problem. Could anyone give me another way to do this? I got confused a bit.
As for the program, I wanted to create a program where I create a text if it doesnt exist. If it exists then it compares each line with a Listbox and see if the value from the Listbox appears there. If it doesnt then it will add to the text.
Dim SR As System.IO.StreamReader
Dim SW As System.IO.StreamWriter
SR = New System.IO.StreamReader("D:\temp\" & Cerberus.TextBox1.Text & "_deleted.txt", True)
SW = New System.IO.StreamWriter("D:\temp\" & Cerberus.TextBox1.Text & "_deleted.txt", True)
Dim strLine As String
Do While SR.Peek <> -1
strLine = SR.ReadLine()
For i = 0 To Cerberus.ListBox2.Items.Count - 1
If Cerberus.ListBox2.Items.Item(i).Contains(strLine) = False Then
SW.WriteLine(Cerberus.ListBox2.Items.Item(i))
End If
Next
Loop
SR.Close()
SW.Close()
SR.Dispose()
SW.Dispose()
MsgBox("Duplicates Removed!")
If your file is not that large, consider using File.ReadAllLines and File.WriteAllLines.
Dim path = "D:\temp\" & Cerberus.TextBox1.Text & "_deleted.txt"
Dim lines = File.ReadAllLines(path) 'String() -- holds all the lines in memory
Dim linesToWrite = Cerberus.ListBox2.Items.Cast(Of String).Except(lines)
File.AppendAllLines(path, linesToWrite)
If the file is large, but you only have to write a few lines, then you can use File.ReadLines:
Dim lines = File.ReadLines(path) 'IEnumerable(Of String)\
'holds only a single line in memory at a time
'but the file remains open until the iteration is finished
Dim linesToWrite = Cerberus.ListBox2.Items.Cast(Of String).Except(lines).ToList
File.AppendAllLines(path, linesToWrite)
If there are a large number of lines to write, then use the answers from this question.

Progress bar with VB.NET Console Application

I've written a parsing utility as a Console Application and have it working pretty smoothly. The utility reads delimited files and based on a user value as a command line arguments splits the record to one of 2 files (good records or bad records).
Looking to do a progress bar or status indicator to show work performed or remaining work while parsing. I could easily write a <.> across the screen within the loop but would like to give a %.
Thanks!
Here is an example of how to calculate the percentage complete and output it in a progress counter:
Option Strict On
Option Explicit On
Imports System.IO
Module Module1
Sub Main()
Dim filePath As String = "C:\StackOverflow\tabSeperatedFile.txt"
Dim FileContents As String()
Console.WriteLine("Reading file contents")
Using fleStream As StreamReader = New StreamReader(IO.File.Open(filePath, FileMode.Open, FileAccess.Read))
FileContents = fleStream.ReadToEnd.Split(CChar(vbTab))
End Using
Console.WriteLine("Sorting Entries")
Dim TotalWork As Decimal = CDec(FileContents.Count)
Dim currentLine As Decimal = 0D
For Each entry As String In FileContents
'Do something with the file contents
currentLine += 1D
Dim progress = CDec((currentLine / TotalWork) * 100)
Console.SetCursorPosition(0I, Console.CursorTop)
Console.Write(progress.ToString("00.00") & " %")
Next
Console.WriteLine()
Console.WriteLine("Finished.")
Console.ReadLine()
End Sub
End Module
1rst you have to know how many lines you will expect.
In your loop calculate "intLineCount / 100 * intCurrentLine"
int totalLines = 0 // "GetTotalLines"
int currentLine = 0;
foreach (line in Lines)
{
/// YOUR OPERATION
currentLine ++;
int progress = totalLines / 100 * currentLine;
///print out the result with the suggested method...
///!Caution: if there are many updates consider to update the output only if the value has changed or just every n loop by using the MOD operator or any other useful approach ;)
}
and print the result on the same posititon in your loop by using the SetCursor method
MSDN Console.SetCursorPosition
VB.NET:
Dim totalLines as Integer = 0
Dim currentLine as integer = 0
For Each line as string in Lines
' Your operation
currentLine += 1I
Dim Progress as integer = (currentLine / totalLines) * 100
' print out the result with the suggested method...
' !Caution: if there are many updates consider to update the output only if the value has changed or just every n loop by using the MOD operator or any other useful approach
Next
Well The easiest way is to update the progressBar variable often,
Ex: if your code consist of around 100 lines or may be 100 functionality
after each function or certain lines of code update progressbar variable with percentage :)

Start reading massive text file from the end

I would ask if you could give me some alternatives in my problems.
basically I'm reading a .txt log file averaging to 8 million lines. Around 600megs of pure raw txt file.
I'm currently using streamreader to do 2 passes on those 8 million lines doing sorting and filtering important parts in the log file, but to do so, My computer is taking ~50sec to do 1 complete run.
One way that I can optimize this is to make the first pass to start reading at the end because the most important data is located approximately at the final 200k line(s) . Unfortunately, I searched and streamreader can't do this. Any ideas to do this?
Some general restriction
# of lines varies
size of file varies
location of important data varies but approx at the final 200k line
Here's the loop code for the first pass of the log file just to give you an idea
Do Until sr.EndOfStream = True 'Read whole File
Dim streambuff As String = sr.ReadLine 'Array to Store CombatLogNames
Dim CombatLogNames() As String
Dim searcher As String
If streambuff.Contains("CombatLogNames flags:0x1") Then 'Keyword to Filter CombatLogNames Packets in the .txt
Dim check As String = streambuff 'Duplicate of the Line being read
Dim index1 As Char = check.Substring(check.IndexOf("(") + 1) '
Dim index2 As Char = check.Substring(check.IndexOf("(") + 2) 'Used to bypass the first CombatLogNames packet that contain only 1 entry
If (check.IndexOf("(") <> -1 And index1 <> "" And index2 <> " ") Then 'Stricter Filters for CombatLogNames
Dim endCLN As Integer = 0 'Signifies the end of CombatLogNames Packet
Dim x As Integer = 0 'Counter for array
While (endCLN = 0 And streambuff <> "---- CNETMsg_Tick") 'Loops until the end keyword for CombatLogNames is seen
streambuff = sr.ReadLine 'Reads a new line to flush out "CombatLogNames flags:0x1" which is unneeded
If ((streambuff.Contains("---- CNETMsg_Tick") = True) Or (streambuff.Contains("ResponseKeys flags:0x0 ") = True)) Then
endCLN = 1 'Value change to determine end of CombatLogName packet
Else
ReDim Preserve CombatLogNames(x) 'Resizes the array while preserving the values
searcher = streambuff.Trim.Remove(streambuff.IndexOf("(") - 5).Remove(0, _
streambuff.Trim.Remove(streambuff.IndexOf("(")).IndexOf("'")) 'Additional filtering to get only valuable data
CombatLogNames(x) = search(searcher)
x += 1 '+1 to Array counter
End If
End While
Else
'MsgBox("Something went wrong, Flame the coder of this program!!") 'Bug Testing code that is disabled
End If
Else
End If
If (sr.EndOfStream = True) Then
ReDim GlobalArr(CombatLogNames.Length - 1) 'Resizing the Global array to prime it for copying data
Array.Copy(CombatLogNames, GlobalArr, CombatLogNames.Length) 'Just copying the array to make it global
End If
Loop
You CAN set the BaseStream to the desired reading position, you just cant set it to a specfic LINE (because counting lines requires to read the complete file)
Using sw As New StreamWriter("foo.txt", False, System.Text.Encoding.ASCII)
For i = 1 To 100
sw.WriteLine("the quick brown fox jumps ovr the lazy dog")
Next
End Using
Using sr As New StreamReader("foo.txt", System.Text.Encoding.ASCII)
sr.BaseStream.Seek(-100, SeekOrigin.End)
Dim garbage = sr.ReadLine ' can not use, because very likely not a COMPLETE line
While Not sr.EndOfStream
Dim line = sr.ReadLine
Console.WriteLine(line)
End While
End Using
For any later read attempt on the same file, you could simply save the final position (of the basestream) and on the next read to advance to that position before you start reading lines.
What worked for me was skipping first 4M lines (just a simple if counter > 4M surrounding everything inside the loop), and then adding background workers that did the filtering, and if important added the line to an array, while main thread continued reading the lines. This saved about third of the time at the end of a day.

Creating Fixed Width files from strings

I have searched high and low on the internet and I can't find a straight answer to this !
I have a file that has approx 100,000 characters in one long line.
I need to read this file in and write it out again in its entirety, in lines 102 character long ending with VbCrLf. There are no delimiters.
I thought there were a number of ways to tackle issues like this in VB Script... but
apparently not !
Can anyone please provide me with a pointer ?
Here's something (off the top of my head - untested!) that should get you started.
Const ForReading = 1
Const ForWriting = 2
Dim sNewLine
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("OldFile.txt", ForReading) ' Your input file
Set tsOut = fso.OpenTextFile("NewFile.txt", ForWriting) ' New (output) file
While Not tsIn.AtEndOfStream ' While there is still text
sNewLine = tsIn.Read(102) ' Read 120 characters
tsOut.Write sNewLine & vbCrLf ' Write out to new file + CR/LF
Wend ' Loop to repeat
tsIn.Close
tsOut.Close
I won't cover the reading of files, since that is stuff you can find everywhere. And since it's been years I've coded in vb or vbscript, I hope that .net code will suffice.
pseudo: read line from file, put it in for example a string (performance issues anyone?).
A simple algorithm would be and this might have performance issues (multithreading, parallel could be a solution):
Public Sub foo()
Dim strLine As String = "foo²"
Dim strLines As List(Of String) = New List(Of String)
Dim nrChars = strLine.ToCharArray.Count
Dim iterations = nrChars / 102
For i As Integer = 0 To iterations - 1
strLines.Add(strLine.Substring(0, 102))
strLine = strLine.Substring(103)
Next
'save it to file
End Sub

VB.Net Replacing Specific Values in a Large Text File

I have some large csv files (1.5gb each) where I need to replace specific values. The method I'm currently using is terribly slow and I'm fairly certain that there should be a way to speed this up but I'm just not experienced enough to know what I should be doing. This is my first post and I tried searching through to find something relevant but didn't come across anything. Any help would be appreciated.
My other thought would be to break the file into chunks so that I can read the entire thing into memory, do all of the replacements there and then output to a consolidated file. I tried this but the way I did it actually ended up seeming slower than my current method.
Thanks!
Sub Main()
Dim fName As String = "2009.csv"
Dim wrtFile As String = "2009.1.csv"
Dim lRead
Dim lwrite As String
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim bulkWrite As String
bulkWrite = ""
Do While strRead.Peek <> -1
lRead = Split(strRead.ReadLine(), ",")
If lRead(9) = "5MM+" Then lRead(9) = "5000000"
If lRead(9) = "1MM+" Then lRead(9) = "1000000"
lwrite = ""
For i = LBound(lRead) To UBound(lRead)
lwrite = lwrite & lRead(i) & ","
Next
strWrite.WriteLine(lwrite)
Loop
strRead.Close()
strWrite.Close()
End Sub
You are splitting and the combining, which can take some time.
Why not just read the line of text. Then replace any occurance of "5MM+" and "1MM+" with the approiate value and then write the line.
Do While ...
s = strRead.ReadLine();
s = s.Replace("5MM+", "5000000")
s = s.Replace("1MM+", "1000000")
strWrite(s);
Loop