Most efficient way to jump through a file and read lines? - vb.net

I want to use a FileStream and seek from the beginning of the file while moving forward in the file .01% of the file size at a time.
So I want to seek to a position in the file, read the entire line, if it matches my criteria I am done. If not, I seek ahead another .01.
C# is OK but VB.NET preferred.
I used to do it something like this in VB6...
FileOpen(1, CurrentFullPath, OpenMode.Input, OpenAccess.Read, OpenShare.Shared)
Dim FileLength As Long = LOF(1)
For x As Single = 0.99 To 0 Step -0.01
Seek(1, CInt(FileLength * x))
Dim S As String = LineInput(1)
S = LineInput(1)
filePosition = Seek(1)
If filePosition < 50000 Then
filePosition = 1
Exit For
End If
V = Split(S, ",")
Dim MessageTime As Date = CDate(V(3) & " " & Mid$(V(4), 1, 8))
Dim Diff As Integer = DateDiff(DateInterval.Minute, MessageTime, CDate(RequestedStartTime))
If Diff >= 2 Then
Exit For
End If
Next
But I don't want to use FileOpen, I want to use a FileStream.
Any help is greatly appreciated!

This is a more or less direct conversion of your code, where we use FileStream.Position to specify where in the file to read:
Using streamReader As System.IO.StreamReader = System.IO.File.OpenText(CurrentFullPath)
For x As Single = 0.99 To 0 Step -0.01
streamReader.BaseStream.Position = CLng(streamReader.BaseStream.Length * x)
Dim S As String = streamReader.ReadLine()
'... etc.
Next
End Using

what bout something like this (C# version):
using (var file = System.IO.File.OpenText(filename))
{
while (!file.EndOfStream)
{
string line = file.ReadLine();
//do your logic here
//Logical test - if true, then break
}
}
EDIT: VB version here (warning - from a C# dev!)
Using file as FileStream = File.OpenText(filename)
while Not file.EndOfStream
Dim line as string = file.ReadLine()
''//Test to break
''//exit while if condition met
End While
End Using

I normally prefer vb.net, but C#'s iterator blocks are slowly winning me over:
public static IEnumerable<string> SkimFile(string FileName)
{
long delta = new FileInfo(FileName).Length / 100;
long position = 0;
using (StreamReader sr = new StreamReader(FileName))
{
while (position < 100)
{
sr.BaseStream.Seek(position * delta, SeekOrigin.Begin);
yield return sr.ReadLine();
position++;
}
}
}
Put it in a class library project and use it from vb like this:
Dim isMatch as Boolean = False
For Each s As String in SkimFile("FileName.txt")
If (RequestedDate - CDate(s.SubString(3,11))).Minutes > 2 Then
isMatch = True
Exit For
End If
Next s
(I took some liberties with you criteria (assumed fixed-width values rather than delimited) to make the example easier)

There's an example on MSDN.
Edit in response to comment:
I must admit I'm a bit confused, as you seemed insistant on using a buffered FileStream, but want to read a file a line at a time? You can do that quite simply using a StreamReader. I don't know VB, but in C# it would be something like this:
using (StreamReader sr = File.OpenText(pathToFile))
{
string line = String.Empty;
while ((line = sr.ReadLine()) != null)
{
// process line
}
}
See http://msdn.microsoft.com/en-us/library/system.io.file.aspx.

Related

Fastest Method to (read, remove, write) to a Text File

I coded a simple program that reads from a Textfile Line by Line and If the current readed Line has alphabetics (a-z A-Z) it will write that Line into an other txt file.
If the current readed line doesn't have alphabetics it wont write that line into a new text file.
I created this for the purpose that I have members registering at my website and some of them are using only numbers as Username. I will filter them out and only save the alphabetic Names. (Focus on this Project please I know i could just use php stuff)
That works great already but it takes a while to read line by line and write into the other text file (Write speed 150kb in 1 Minute - Its not my drive I have a fast ssd).
So I wonder if there is a faster way. I could "readalllines" first but on large files it just freezes my program so I don't know if that works too (I want to focus on large +1gb files)
This is my code so far:
If System.IO.File.Exists(FILE_NAME) = True Then
Dim objReader As New System.IO.StreamReader(FILE_NAME)
Do While objReader.Peek() <> -1
Dim myFile As New FileInfo(output)
Dim sizeInBytes As Long = myFile.Length
If sizeInBytes > splitvalue Then
outcount += 1
output = outputold + outcount.ToString + ".txt"
File.Create(output).Dispose()
End If
count += 1
TextLine = objReader.ReadLine() & vbNewLine
Console.WriteLine(TextLine)
If CheckForAlphaCharacters(TextLine) Then
File.AppendAllText(output, TextLine)
Else
found += 1
Label2.Text = "Removed: " + found.ToString
TextBox1.Text = TextLine
End If
Label1.Text = "Checked: " + count.ToString
Loop
MessageBox.Show("Finish!")
End If
First of all, as hinted by #Sean Skelly updating UI controls - repeatedly - is an expensive operation.
But your bigger problem is File.AppendAllText:
If CheckForAlphaCharacters(TextLine) Then
File.AppendAllText(output, TextLine)
Else
found += 1
Label2.Text = "Removed: " + found.ToString
TextBox1.Text = TextLine
End If
AppendAllText(String, String)
Opens a file, appends the specified string to the file, and then
closes the file. If the file does not exist, this method creates a
file, writes the specified string to the file, then closes the file.
Source
You are repeatedly opening and closing a file, causing overhead. AppendAllText is a convenience method since it performs several operations in one single call but you can now see why it's not performing well in a big loop.
The fix is easy. Open the file once when you start your loop and close it at the end. Make sure that you always close the file properly even when an exception occurs. For that, you can either invoke the Close in a Finally block, or use a context manager, that is keep your file write operations within a Using block.
And you could remove the print to console as well. Display management has a cost too. Or you could print status updates every 10K lines or so.
When you've done all that, you should notice improved performance.
My Final Code - It works a lot faster now (500mbs in 1 minute)
Using sw As StreamWriter = File.CreateText(output)
For Each oneLine As String In File.ReadLines(FILE_NAME)
Try
If changeme = True Then
changeme = False
GoTo Again2
End If
If oneLine.Contains(":") Then
Dim TestString = oneLine.Substring(0, oneLine.IndexOf(":")).Trim()
Dim TestString2 = oneLine.Substring(oneLine.IndexOf(":")).Trim()
If CheckForAlphaCharacters(TestString) = False And CheckForAlphaCharacters(TestString2) = False Then
sw.WriteLine(oneLine)
Else
found += 1
End If
ElseIf oneLine.Contains(";") Or oneLine.Contains("|") Or oneLine.Contains(" ") Then
Dim oneLineReplac As String = oneLine.Replace(" ", ":")
Dim oneLineReplace As String = oneLineReplac.Replace("|", ":")
Dim oneLineReplaced As String = oneLineReplace.Replace(";", ":")
If oneLineReplaced.Contains(":") Then
Dim TestString3 = oneLineReplaced.Substring(0, oneLineReplaced.IndexOf(":")).Trim()
Dim TestString4 = oneLineReplaced.Substring(oneLineReplaced.IndexOf(":")).Trim()
If CheckForAlphaCharacters(TestString3) = False And CheckForAlphaCharacters(TestString4) = False Then
sw.WriteLine(oneLineReplaced)
Else
found += 1
End If
Else
errors += 1
textstring = oneLine
End If
Else
errors += 1
textstring = oneLine
End If
count += 1
Catch
errors += 1
textstring = oneLine
End Try
Next
End Using

Array Out of bounds error VB

Sorry for the terrible wording on my last question, I was half asleep and it was midnight. This time I'll try to be more clear.
I'm currently writing some code for a mini barcode scanner and stock manager program. I've got the input and everything sorted out, but there is a problem with my arrays.
I'm currently trying to extract the contents of the stock file and sort them out into product tables.
This is my current code for getting the data:
Using fs As StreamReader = New StreamReader("The File Path (Is private)")
Dim line As String = "ERROR"
line = fs.ReadLine()
While line <> Nothing
Dim pos As Integer = 0
Dim split(3) As String
pos = products.Length
split = line.Split("|")
productCodes(productCodes.Length) = split(0)
products(products.Length, 0) = split(1)
products(products.Length, 1) = split(2)
products(products.Length, 2) = split(3)
line = fs.ReadLine()
End While
End Using
I have made sure that the file path does, in fact, go to the file. I have looked through debug to find that all the data is going through into my "split" table. The error throws as soon as I start trying to transfer the data.
This is where I declare the two tables being used:
Dim productCodes() As String = {}
Dim products(,) As Object = {}
Can somebody please explain why this is happening?
Thanks in advance
~Hydro
By declaring the arrays like you did:
Dim productCodes() As String = {}
Dim products(,) As Object = {}
You are assigning size 0 to all your arrays, so during your loop, it will eventually try to access a position that haven't been previously declared to the compiler. It is the same as declaring an array of size 10 Dim MyArray(10) and try to access the position 11 MyArray(11) = something.
You should either declare it with a proper size, or redim it during execution time:
Dim productCodes(10) As String
or
Dim productCodes() As String
Dim Products(,) As String
Dim Position as integer = 0
'code here
While line <> Nothing
Redim Preserve productCodes(Position)
Redim Preserve products(2,Position)
Dim split(3) As String
pos = products.Length
split = line.Split("|")
productCodes(Position) = split(0)
products(0,Position) = split(1)
products(1,Position) = split(2)
products(2,Position) = split(3)
line = fs.ReadLine()
Position+=1
End While

Progress bar with VB.NET Console Application

I've written a parsing utility as a Console Application and have it working pretty smoothly. The utility reads delimited files and based on a user value as a command line arguments splits the record to one of 2 files (good records or bad records).
Looking to do a progress bar or status indicator to show work performed or remaining work while parsing. I could easily write a <.> across the screen within the loop but would like to give a %.
Thanks!
Here is an example of how to calculate the percentage complete and output it in a progress counter:
Option Strict On
Option Explicit On
Imports System.IO
Module Module1
Sub Main()
Dim filePath As String = "C:\StackOverflow\tabSeperatedFile.txt"
Dim FileContents As String()
Console.WriteLine("Reading file contents")
Using fleStream As StreamReader = New StreamReader(IO.File.Open(filePath, FileMode.Open, FileAccess.Read))
FileContents = fleStream.ReadToEnd.Split(CChar(vbTab))
End Using
Console.WriteLine("Sorting Entries")
Dim TotalWork As Decimal = CDec(FileContents.Count)
Dim currentLine As Decimal = 0D
For Each entry As String In FileContents
'Do something with the file contents
currentLine += 1D
Dim progress = CDec((currentLine / TotalWork) * 100)
Console.SetCursorPosition(0I, Console.CursorTop)
Console.Write(progress.ToString("00.00") & " %")
Next
Console.WriteLine()
Console.WriteLine("Finished.")
Console.ReadLine()
End Sub
End Module
1rst you have to know how many lines you will expect.
In your loop calculate "intLineCount / 100 * intCurrentLine"
int totalLines = 0 // "GetTotalLines"
int currentLine = 0;
foreach (line in Lines)
{
/// YOUR OPERATION
currentLine ++;
int progress = totalLines / 100 * currentLine;
///print out the result with the suggested method...
///!Caution: if there are many updates consider to update the output only if the value has changed or just every n loop by using the MOD operator or any other useful approach ;)
}
and print the result on the same posititon in your loop by using the SetCursor method
MSDN Console.SetCursorPosition
VB.NET:
Dim totalLines as Integer = 0
Dim currentLine as integer = 0
For Each line as string in Lines
' Your operation
currentLine += 1I
Dim Progress as integer = (currentLine / totalLines) * 100
' print out the result with the suggested method...
' !Caution: if there are many updates consider to update the output only if the value has changed or just every n loop by using the MOD operator or any other useful approach
Next
Well The easiest way is to update the progressBar variable often,
Ex: if your code consist of around 100 lines or may be 100 functionality
after each function or certain lines of code update progressbar variable with percentage :)

vb.net access the streamreader.readline which is currently looping

In C#, it seems the following method works which will allow you to
call Streamreader.readline() and then access that very same line in the next loop.
string line;
double score = 0;
count = 0;
while ((line = sr.ReadLine()) != null)
{
score += double.Parse (line);
count++;
}
averageScore = (double)inValue / count;
This does not work in vb.net though, even with all the converters which I tried.
this one completely bombs out and throws "page not found"
http://converter.telerik.com/
this one:
http://www.developerfusion.com/tools/convert/csharp-to-vb/?batchId=3356dd81-818a-43d1-8c23-be584e40d15e
results in :
Dim line As String
Dim score As Double = 0
count = 0
While (InlineAssignHelper(line, sr.ReadLine())) IsNot Nothing
score += Double.Parse(line)
count += 1
End While
averageScore = CDbl(inValue) / count
where InlineAssignHelper is something I dont have.
If you have the filename, instead of just a stream, you can use System.IO.File.ReadLines for the neatest way:
Dim score As Double = 0
count = 0
For Each line In File.ReadLines(filename)
score += Double.Parse(sr.ReadLine())
count += 1
Next
averageScore = CDbl(inValue) / count
Or use LINQ:
Dim averageScore = File.ReadLines(filename).Average(AddressOf Double.Parse)
Use a Do-Loop with an exit statement, like this:
Do
Dim line = sr.ReadLine()
If line Is Nothing Then Exit Do
score += Double.Parse(line)
count += 1
Loop
Do note the advantage of doing it this way over the accepted answer, it is very lean on memory usage and doesn't require the entire file to fit in memory.

Change just one line in a text file?

I have a text file with the format:
(title,price,id#)
CD1,11.00,111111
CD2,12.00,222222
CD3,13.00,333333
CD4,14.00,444444
CD5,15.00,555555
CD6,16.00,666666
What is the best way to go change the price of the appropriate CD if I'm given the id# and new price?
I'm sure it has something do to with getting the line and splitting it, but I'm not sure how I edit just one line and not mess up the whole file.
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). For such a small file it's probably the easiest to change the line in memory and then rewrite all to the file:
Dim idToFind = "444444"
Dim newPrice = "100"
Dim lines = IO.File.ReadAllLines(path)
For i = 0 To lines.Length - 1
Dim line = lines(i)
Dim fields = line.Split(","c)
If fields.Length > 2 Then
Dim id = fields(2)
If id = idToFind Then
Dim title = fields(0)
lines(i) = String.Format("{0},{1},{2}", title, newPrice, id)
Exit For
End If
End If
Next
IO.File.WriteAllLInes(path, lines)
Okay, now we know it's a short file, life becomes much easier:
Load the file into an array of lines using File.ReadAllLines
Find the right line using string.Split to split each line into the constituent parts, and check the ID.
When you've found the right line, replace it with the complete new line
Write the file back with File.WriteAllLines
That should be enough to get you going.
If its just a file with like 25 lines, you could do a simple input-transform-output routine and update the price per line.
Something like this (Using Streamreader / writer ).
Sub UpdatePrice(ByVal pricesToUpdate As Dictionary(Of Integer, String), ByVal inputPath As String)
If Not IO.File.Exists(inputPath) Then Return
Try
Using inputStream = New IO.StreamReader(inputPath, System.Text.Encoding.UTF8, True)
Using outputStream = New IO.StreamWriter(inputPath + ".tmp", False, System.Text.Encoding.UTF8)
While Not inputStream.EndOfStream
Dim inputLine = inputStream.ReadLine
Dim content = inputLine.Split(","c)
If Not content.Length >= 3 Then
outputStream.WriteLine(inputLine)
Continue While
End If
Dim id As Integer
If Not Integer.TryParse(content(2), id) Then
outputStream.WriteLine(inputLine)
Continue While
End If
If Not pricesToUpdate.ContainsKey(id) Then
outputStream.WriteLine(inputLine)
Continue While
End If
content(1) = pricesToUpdate(id)
outputStream.WriteLine(String.Join(",", {content(0), content(1), content(2)}))
End While
End Using
End Using
If IO.File.Exists(inputPath + ".tmp") Then
IO.File.Delete(inputPath)
IO.File.Move(inputPath + ".tmp", inputPath)
End If
Catch ex As IO.IOException
If IO.File.Exists(inputPath + ".tmp") Then IO.File.Delete(inputPath + ".tmp")
End Try
End Sub