Fastest Method to (read, remove, write) to a Text File - vb.net

I coded a simple program that reads from a Textfile Line by Line and If the current readed Line has alphabetics (a-z A-Z) it will write that Line into an other txt file.
If the current readed line doesn't have alphabetics it wont write that line into a new text file.
I created this for the purpose that I have members registering at my website and some of them are using only numbers as Username. I will filter them out and only save the alphabetic Names. (Focus on this Project please I know i could just use php stuff)
That works great already but it takes a while to read line by line and write into the other text file (Write speed 150kb in 1 Minute - Its not my drive I have a fast ssd).
So I wonder if there is a faster way. I could "readalllines" first but on large files it just freezes my program so I don't know if that works too (I want to focus on large +1gb files)
This is my code so far:
If System.IO.File.Exists(FILE_NAME) = True Then
Dim objReader As New System.IO.StreamReader(FILE_NAME)
Do While objReader.Peek() <> -1
Dim myFile As New FileInfo(output)
Dim sizeInBytes As Long = myFile.Length
If sizeInBytes > splitvalue Then
outcount += 1
output = outputold + outcount.ToString + ".txt"
File.Create(output).Dispose()
End If
count += 1
TextLine = objReader.ReadLine() & vbNewLine
Console.WriteLine(TextLine)
If CheckForAlphaCharacters(TextLine) Then
File.AppendAllText(output, TextLine)
Else
found += 1
Label2.Text = "Removed: " + found.ToString
TextBox1.Text = TextLine
End If
Label1.Text = "Checked: " + count.ToString
Loop
MessageBox.Show("Finish!")
End If

First of all, as hinted by #Sean Skelly updating UI controls - repeatedly - is an expensive operation.
But your bigger problem is File.AppendAllText:
If CheckForAlphaCharacters(TextLine) Then
File.AppendAllText(output, TextLine)
Else
found += 1
Label2.Text = "Removed: " + found.ToString
TextBox1.Text = TextLine
End If
AppendAllText(String, String)
Opens a file, appends the specified string to the file, and then
closes the file. If the file does not exist, this method creates a
file, writes the specified string to the file, then closes the file.
Source
You are repeatedly opening and closing a file, causing overhead. AppendAllText is a convenience method since it performs several operations in one single call but you can now see why it's not performing well in a big loop.
The fix is easy. Open the file once when you start your loop and close it at the end. Make sure that you always close the file properly even when an exception occurs. For that, you can either invoke the Close in a Finally block, or use a context manager, that is keep your file write operations within a Using block.
And you could remove the print to console as well. Display management has a cost too. Or you could print status updates every 10K lines or so.
When you've done all that, you should notice improved performance.

My Final Code - It works a lot faster now (500mbs in 1 minute)
Using sw As StreamWriter = File.CreateText(output)
For Each oneLine As String In File.ReadLines(FILE_NAME)
Try
If changeme = True Then
changeme = False
GoTo Again2
End If
If oneLine.Contains(":") Then
Dim TestString = oneLine.Substring(0, oneLine.IndexOf(":")).Trim()
Dim TestString2 = oneLine.Substring(oneLine.IndexOf(":")).Trim()
If CheckForAlphaCharacters(TestString) = False And CheckForAlphaCharacters(TestString2) = False Then
sw.WriteLine(oneLine)
Else
found += 1
End If
ElseIf oneLine.Contains(";") Or oneLine.Contains("|") Or oneLine.Contains(" ") Then
Dim oneLineReplac As String = oneLine.Replace(" ", ":")
Dim oneLineReplace As String = oneLineReplac.Replace("|", ":")
Dim oneLineReplaced As String = oneLineReplace.Replace(";", ":")
If oneLineReplaced.Contains(":") Then
Dim TestString3 = oneLineReplaced.Substring(0, oneLineReplaced.IndexOf(":")).Trim()
Dim TestString4 = oneLineReplaced.Substring(oneLineReplaced.IndexOf(":")).Trim()
If CheckForAlphaCharacters(TestString3) = False And CheckForAlphaCharacters(TestString4) = False Then
sw.WriteLine(oneLineReplaced)
Else
found += 1
End If
Else
errors += 1
textstring = oneLine
End If
Else
errors += 1
textstring = oneLine
End If
count += 1
Catch
errors += 1
textstring = oneLine
End Try
Next
End Using

Related

trouble with interrupting a Batch file started with VBA

I have an Excel file with a lot of PC's names in a server, I want to execute the "systeminfo" command and get the OS out of it. Then the OS shall be put into an Excel cell automatically. To do so, I used the following codes, respectively in the VBA file and the batch file.
however, whenever the server can't reach a pc, the cmd window is stuck until I manually close it. Since the list is actually 148 names long, knowing of a way to automatically close those Windows after, say, 8 seconds would be really helpful.
I tried to look up for a way to multi-thread VBA, just to find out that It is a single-threaded Language. I then tried to start another batch file with the one I'm actually using as to forcefuly kill it afetr a set of time, but it seems that the second batch starts only after the first is terminated, making it useless.
VBA
Sub Test()
'
' Test Macro
' I'm not an expert in VBA, I just picked it up for this task, so a lot of code will result redundant. Bear with me
'
'
Dim i As Integer
'a is basically i-1.
a = 1
' I needed 148 cells for the project
Dim models(1 To 147) As String
For i = 2 To 148
models(a) = Cells(i, 3).Value
a = a + 1
Next i
a = 1
For i = 2 To 148
'not totally sure what the next five lines actually do, but "metodo" is the name of the batch file.
Dim strShellCommand As String
strShellCommand = "C:\Users\Administrator\Desktop\metodo.bat " + models(a)
Set oSh = CreateObject("WScript.Shell")
Set oEx = oSh.Exec(strShellCommand)
strBuf = oEx.StdOut.readAll
'I took out of the string everything that wasn't purely the OS name
Dim FinalString As String
FinalString = Right(strBuf, 26)
FinalString = Left(FinalString, 25)
'this is the line that prints the OS names into Excel cells
ActiveSheet.Cells(i, 10) = FinalString
a = a + 1
Next i
End Sub
then there is the Batch file
set nome=%1
shift
systeminfo /s %nome% |findstr /c:"Microsoft Windows "
u can do a control loop after the Set oEx = oSh.Exec(strShellCommand)
like :
Set oEx = oSh.Exec(strShellCommand)
LoopCount = 0
Do 'Control loop
wscript.Sleep 1000
If TimeOut > 0 Then LoopCount = LoopCount + 1
Loop Until (oEx.Status <> 0) Or (LoopCount > TimeOut * 8)
If oEx.Status = 0 Then 'Timeout occured
oEx.Terminate
ReturnValue = "[Process terminated after timeout!]" & VbCrLf
Else
ReturnValue = "[Process completed]" & VbCrLf
End If
each loop takes 1 second (wscript.Sleep 1000) and the (LoopCount > TimeOut * 8) sets the total time to 8 seconds
good luck

FileInfo returning wrong value?

Okay, so I'm working in VB.NET, manually writing error logs to log files (yes, I know, I didn't make the call). Now, if the files are over an arbitrary size, when the function goes to write out the new error data, it should start a new file with a new file name.
Here's the function:
Dim listener As New Logging.FileLogTraceListener
listener.CustomLocation = System.Configuration.ConfigurationManager.AppSettings("LogDir")
Dim loc As String = DateTime.UtcNow.Year.ToString + DateTime.UtcNow.Month.ToString + DateTime.UtcNow.Day.ToString + DateTime.UtcNow.Hour.ToString + DateTime.UtcNow.Minute.ToString
listener.BaseFileName = loc
Dim logFolder As String
Dim source As String
logFolder = ConfigurationManager.AppSettings("LogDir")
If ex.Data.Item("Source") Is Nothing Then
source = ex.Source
Else
source = ex.Data.Item("Source").ToString
End If
Dim errorFileInfo As New FileInfo(listener.FullLogFileName)
Dim errorLengthInBytes As Long = errorFileInfo.Length
If (errorLengthInBytes > CType(System.Configuration.ConfigurationManager.AppSettings("maxFileSizeInBytes"), Long)) Then
listener.BaseFileName = listener.BaseFileName + "1"
End If
Dim msg As New System.Text.StringBuilder
If String.IsNullOrEmpty(logFolder) Then logFolder = ConfigurationManager.AppSettings("LogDir")
msg.Append(vbCrLf & "Exception" & vbCrLf)
msg.Append(vbTab & String.Concat("App: AppMonitor | Time: ", Date.Now.ToString) & vbCrLf)
msg.Append(vbTab & String.Concat("Source: ", source, " | Message: ", ex.Message) & vbCrLf)
msg.Append(vbTab & "Stack: " & ex.StackTrace & vbCrLf)
listener.Write(msg.ToString())
listener.Flush()
listener.Close()
I have this executing in a loop for testing purposes, so I can see what happens when it gets (say) 10000 errors in all at once. Again, I know there are better ways to handle this systemically, but this was the code I was told to implement.
How can I reliably get the size of the log file before writing to it, as I try to do above?
Well, as with many things, the answer to this turned out to be "did you read your own code closely" with a side order of "eat something, you need to fix your blood sugar."
On review, I saw that I was always checking BaseFileName and, if it was over the arbitrary limit, appending a character and writing to that file. What I didn't do was check to see if that file or, indeed, other more recent files existed. I've solved the issue be grabbing a directory list of all the files matching the "BaseFileName*" argument in Directory.GetFiles and selecting the most recently accessed one. That ensures that the logger will always select the more current file to write to or -if necessary- use as the base-name for another appended character.
Here's that code:
Dim directoryFiles() As String = Directory.GetFiles(listener.Location.ToString(), listener.BaseFileName + "*")
Dim targetFile As String = directoryFiles(0)
For j As Integer = 1 To directoryFiles.Count - 1 Step 1
Dim targetFileInfo As New FileInfo(targetFile)
Dim compareInfo As New FileInfo(directoryFiles(j))
If (targetFileInfo.LastAccessTimeUtc < compareInfo.LastAccessTimeUtc) Then
targetFile = directoryFiles(j)
End If
Next
Dim errorFileInfo As New FileInfo(listener.Location.ToString() + targetFile)
Dim errorLengthInBytes As Long = errorFileInfo.Length

Reading the next line from a text file

I'm working on an RPG type game for a project and I am stuck.
Basically, this code searches for a name in a text file (structure: odds as names and evens as levels). It then needs to output the next line which is the level they where on. I have the counter (variable "count") to output the right text line in which the level is written but I can not use that count to read that line (using "FileSystem.LineInput(count)").
Here is my full code:
Sub LoadGame()
Dim filename, filepath, searchitem, question, read As String
Dim found As Boolean
Dim count As Integer = 1
filename = "save.txt"
filepath = CurDir() & "\" & filename
searchitem = name
FileOpen(1, filename, OpenMode.Input)
Do While Not EOF(1)
read = LineInput(1)
If read = searchitem Then
found = True
Exit Do
Else
found = False
End If
count = count + 1
Loop
If found = True Then
If count >= 3 Then
count = count + 1
End If
question = FileSystem.LineInput(count) ' This bit is broken
Console.WriteLine("Found save game... Loading: " & question)
Console.ReadLine()
Console.BackgroundColor = ConsoleColor.Black
Console.ForegroundColor = ConsoleColor.Red
Console.Clear()
Race(question)
Else
Console.WriteLine("No save game...")
Console.ReadLine()
End If
FileClose(1)
End Sub
I am not sure what is wrong but any help would be greatly appreciated (using VB 2010)
LineInput reads the next line of the specified file (parameter FileNumber).
Your file has the FileNumber 1 and the file pointer points to the desired line. Therefore, it should be sufficient to
question = FileSystem.LineInput(1)
In my oppinion, you should avoid those kinds of file access (per FileNumber) in .Net. This is just an old relict from VB6 times. In .Net you have easy-to-use classes such as StreamReader for that purpose. But if you want to do it the old-fashioned way, at least use the FreeFile method to define the file number.

Start reading massive text file from the end

I would ask if you could give me some alternatives in my problems.
basically I'm reading a .txt log file averaging to 8 million lines. Around 600megs of pure raw txt file.
I'm currently using streamreader to do 2 passes on those 8 million lines doing sorting and filtering important parts in the log file, but to do so, My computer is taking ~50sec to do 1 complete run.
One way that I can optimize this is to make the first pass to start reading at the end because the most important data is located approximately at the final 200k line(s) . Unfortunately, I searched and streamreader can't do this. Any ideas to do this?
Some general restriction
# of lines varies
size of file varies
location of important data varies but approx at the final 200k line
Here's the loop code for the first pass of the log file just to give you an idea
Do Until sr.EndOfStream = True 'Read whole File
Dim streambuff As String = sr.ReadLine 'Array to Store CombatLogNames
Dim CombatLogNames() As String
Dim searcher As String
If streambuff.Contains("CombatLogNames flags:0x1") Then 'Keyword to Filter CombatLogNames Packets in the .txt
Dim check As String = streambuff 'Duplicate of the Line being read
Dim index1 As Char = check.Substring(check.IndexOf("(") + 1) '
Dim index2 As Char = check.Substring(check.IndexOf("(") + 2) 'Used to bypass the first CombatLogNames packet that contain only 1 entry
If (check.IndexOf("(") <> -1 And index1 <> "" And index2 <> " ") Then 'Stricter Filters for CombatLogNames
Dim endCLN As Integer = 0 'Signifies the end of CombatLogNames Packet
Dim x As Integer = 0 'Counter for array
While (endCLN = 0 And streambuff <> "---- CNETMsg_Tick") 'Loops until the end keyword for CombatLogNames is seen
streambuff = sr.ReadLine 'Reads a new line to flush out "CombatLogNames flags:0x1" which is unneeded
If ((streambuff.Contains("---- CNETMsg_Tick") = True) Or (streambuff.Contains("ResponseKeys flags:0x0 ") = True)) Then
endCLN = 1 'Value change to determine end of CombatLogName packet
Else
ReDim Preserve CombatLogNames(x) 'Resizes the array while preserving the values
searcher = streambuff.Trim.Remove(streambuff.IndexOf("(") - 5).Remove(0, _
streambuff.Trim.Remove(streambuff.IndexOf("(")).IndexOf("'")) 'Additional filtering to get only valuable data
CombatLogNames(x) = search(searcher)
x += 1 '+1 to Array counter
End If
End While
Else
'MsgBox("Something went wrong, Flame the coder of this program!!") 'Bug Testing code that is disabled
End If
Else
End If
If (sr.EndOfStream = True) Then
ReDim GlobalArr(CombatLogNames.Length - 1) 'Resizing the Global array to prime it for copying data
Array.Copy(CombatLogNames, GlobalArr, CombatLogNames.Length) 'Just copying the array to make it global
End If
Loop
You CAN set the BaseStream to the desired reading position, you just cant set it to a specfic LINE (because counting lines requires to read the complete file)
Using sw As New StreamWriter("foo.txt", False, System.Text.Encoding.ASCII)
For i = 1 To 100
sw.WriteLine("the quick brown fox jumps ovr the lazy dog")
Next
End Using
Using sr As New StreamReader("foo.txt", System.Text.Encoding.ASCII)
sr.BaseStream.Seek(-100, SeekOrigin.End)
Dim garbage = sr.ReadLine ' can not use, because very likely not a COMPLETE line
While Not sr.EndOfStream
Dim line = sr.ReadLine
Console.WriteLine(line)
End While
End Using
For any later read attempt on the same file, you could simply save the final position (of the basestream) and on the next read to advance to that position before you start reading lines.
What worked for me was skipping first 4M lines (just a simple if counter > 4M surrounding everything inside the loop), and then adding background workers that did the filtering, and if important added the line to an array, while main thread continued reading the lines. This saved about third of the time at the end of a day.

Change just one line in a text file?

I have a text file with the format:
(title,price,id#)
CD1,11.00,111111
CD2,12.00,222222
CD3,13.00,333333
CD4,14.00,444444
CD5,15.00,555555
CD6,16.00,666666
What is the best way to go change the price of the appropriate CD if I'm given the id# and new price?
I'm sure it has something do to with getting the line and splitting it, but I'm not sure how I edit just one line and not mess up the whole file.
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). For such a small file it's probably the easiest to change the line in memory and then rewrite all to the file:
Dim idToFind = "444444"
Dim newPrice = "100"
Dim lines = IO.File.ReadAllLines(path)
For i = 0 To lines.Length - 1
Dim line = lines(i)
Dim fields = line.Split(","c)
If fields.Length > 2 Then
Dim id = fields(2)
If id = idToFind Then
Dim title = fields(0)
lines(i) = String.Format("{0},{1},{2}", title, newPrice, id)
Exit For
End If
End If
Next
IO.File.WriteAllLInes(path, lines)
Okay, now we know it's a short file, life becomes much easier:
Load the file into an array of lines using File.ReadAllLines
Find the right line using string.Split to split each line into the constituent parts, and check the ID.
When you've found the right line, replace it with the complete new line
Write the file back with File.WriteAllLines
That should be enough to get you going.
If its just a file with like 25 lines, you could do a simple input-transform-output routine and update the price per line.
Something like this (Using Streamreader / writer ).
Sub UpdatePrice(ByVal pricesToUpdate As Dictionary(Of Integer, String), ByVal inputPath As String)
If Not IO.File.Exists(inputPath) Then Return
Try
Using inputStream = New IO.StreamReader(inputPath, System.Text.Encoding.UTF8, True)
Using outputStream = New IO.StreamWriter(inputPath + ".tmp", False, System.Text.Encoding.UTF8)
While Not inputStream.EndOfStream
Dim inputLine = inputStream.ReadLine
Dim content = inputLine.Split(","c)
If Not content.Length >= 3 Then
outputStream.WriteLine(inputLine)
Continue While
End If
Dim id As Integer
If Not Integer.TryParse(content(2), id) Then
outputStream.WriteLine(inputLine)
Continue While
End If
If Not pricesToUpdate.ContainsKey(id) Then
outputStream.WriteLine(inputLine)
Continue While
End If
content(1) = pricesToUpdate(id)
outputStream.WriteLine(String.Join(",", {content(0), content(1), content(2)}))
End While
End Using
End Using
If IO.File.Exists(inputPath + ".tmp") Then
IO.File.Delete(inputPath)
IO.File.Move(inputPath + ".tmp", inputPath)
End If
Catch ex As IO.IOException
If IO.File.Exists(inputPath + ".tmp") Then IO.File.Delete(inputPath + ".tmp")
End Try
End Sub