How do I replace 'bad characters' in a text file with a space in VB.NET - vb.net

I'm trying to rid a text file of currupt data. I parse the file and if I find a bad character, I replace it with a space. My problem is that the space is not overwriting the bad character. Instead, the space is written on line 10 position 27. What's going on here?
I've been stuck on this seemingly simple problem for half a day. Thanks.
Sub replaceChars(fname As String)
Dim fs As New FileStream(fname, FileMode.Open, FileAccess.ReadWrite)
Dim r As New StreamReader(fs)
Dim w As New StreamWriter(fs)
Dim iChar As Integer = 0
Do Until r.Peek() = -1
iChar = r.Read()
If iChar < 32 Or iChar > 126 Then
If iChar = 13 Or iChar = 10 Then 'cr/lf, continue.
Continue Do
Else 'found a bad char. replace it.
w.Write(Chr(32))
w.Flush()
fs.Flush()
End If
Else
Continue Do
End If
Loop
w.Close()
fs.Close()
End Sub

I really dont like the idea that you share a filestream for a reader and a writter, since I have had lots of 'unreasonable' problems because of that.
I've leanred that when using a Idiposable objects use 'Using' Statement wich waranties that proceses is finished, and you are not getting wired errors.
try this, maybe it helps and it is always better to do it this way.
Using Statement

Related

Removing CR/LF at end of file in VB.net

I've searched for a solution to this, but any I've found are either doing much more than I need or are not exactly what I want.
I have files I want to append to. I need to append to the end of the last line but they all have a carriage return and so I'll end up appending to the new line if I just append as normal.
All I want is to make a subroutine that takes a file path and removes the CR/LF at the end of it, no more, no less. Any help pointing me at a solution to this would be appreciated. I'm surprised there isn't a built in function to do this.
Dim crString = Environment.NewLine '= vbCrLf
Dim crBytes = Encoding.UTF8.GetBytes(crString)
Dim bytesRead(crBytes.Length - 1) as Byte
Dim iOffset As Integer = 0
Dim stringRead As String
Using fs = File.Open("G:\test.txt", FileMode.Open, FileAccess.ReadWrite)
While iOffset < fs.Length
fs.Seek(- (crBytes.Length + iOffset), SeekOrigin.End)
fs.Read(bytesRead,0, crBytes.Length)
stringRead = Encoding.UTF8.GetString(bytesRead)
If stringRead = crString Then
fs.SetLength(fs.Length - (crBytes.Length * iOffset + 1))
Exit While
End If
iOffset += 1
End While
End Using
I open the text file as FileStream and set its position to the end of the file - length of the carriage return string.
I then read the current bytes while decreasing the offset until I found a carriage return or the eof has been reached.
If a CR has been found I remove it and everything what comes after.
If you don´t want that just remove the loop and check the eof only.
But there could be some vbNullString at the eof that´s why I´m using the loop.
Please note that I used UTF8 encoding in my example. If you have other encodings you have to adapt it accordingly.
test.txt before run:
test.txt after code snippet run:
EDIT: fs.SetLength part was wrong in case of last character in file was not a CR.
I have found String.Replace(ControlChars.CrLf.ToCharArray(),"") works.
Probably better ways to do it as well!

How to Read first x characters of each line of a text file

HI have a Program that loops through a text file and reads the lines.
Using r As StreamReader = New StreamReader("C:\test.txt")
Dim lineCount = File.ReadAllLines("C:\test.txt").Length
MsgBox(lineCount)
For x = 1 to linecount
line = r.ReadLine
msgbox (line)
next
How can I read the Left most 15 characters of each line of the text file, ignoring the other characters in each line. Lastly, if there are spaces in the first 15 characters, i'd like to remove them.
Calling ReadAllLines to discover how many lines are there to drive a for..loop is wrong and a waste of resources (Essentially you are reading all the file content two times). In particular consider that a StreamReader has a method that tells you if you have reached the end of file
So everything could be changed to a simpler
Dim clippedLines = new List(Of String)()
Dim line As String
Using r As StreamReader = New StreamReader("C:\test.txt")
While r.Peek() >= 0
line = r.ReadLine
Dim temp = if(line.Length > 15, line.Substring(0,15), line)
clippedLines.Add(temp.Trim().Replace(" "c, ""))
End While
End Using
This will remove all the spaces from the line AFTER taking the first 15 char (and thus the result could be a string shorter than 15 chars) If you want a line of 15 char after the space removing operation, then
While r.Peek() >= 0
line = r.ReadLine.Trim().Replace(" "c, "")
Dim temp = if(line.Length > 15, line.Substring(0,15), line)
clippedLines.Add(temp)
End While

A loop exits prematurely when using StreamReader

It's been a long time since I've programmed. I'm writing a form in VB.NET, and using StreamReader to read a text file and populate an 2D array. Here is the text file:
あかさたなはまやらわん
いきしちにひみ り
うくすつぬふむゆる
えけせてねへめ れ
おこそとのほもよろを
And here is the loop, which is within the Load event.
Dim Line As String
Dim Row As Integer = 0
Using sReader As New IO.StreamReader("KanaTable.txt")
Do
Line = sReader.ReadLine
For i = 0 To Line.Length - 1
KanaTable(Row, i) = Line(i)
Next
Row += 1
Loop Until sReader.EndOfStream
End Using
The problem is, once the i in the For Loop reaches 10, it completes the loop and skips the other lines, even when I have a breakpoint. Can you let me know what's probably going on here?
I've figured out the problem, it was very simple. The array declaration for KanaTable:
Dim KanaTable(4, 9) As Char
should have been
Dim KanaTable(4, 10) As Char
Because there was one less space in the array than there should have been, the debugger must have been throwing an IndexOutOfRange which I couldn't see, because, stupid Windows bug (thanks to Bradley Uffner for pointing out this bug.)
If you can use an array of arrays or a list of arrays (List(Of Char())), you can get this down to a single line of code:
Dim KanaTable()() As Char = IO.File.ReadLines("KanaTable.txt").Select(Function(line) line.ToCharArray()).ToArray()
If that's too complicated for you, we can at least simplify the existing code:
Dim KanaTable As New List(Of Char())
Dim Line As String
Using sReader As New IO.StreamReader("KanaTable.txt")
Line = sReader.ReadLine()
While Line IsNot Nothing
KanaTable.Add(Line.ToCharArray())
Line = sReader.ReadLine()
End While
End Using
I can't see an error immediately, but you could try to adapt your code to this:
Using reader As New IO.StreamReader("KanaTable.txt")
Do
line= reader.ReadLine()
If line = Nothing Then
Exit Do
End If
For i = 0 To Line.Length - 1
KanaTable(Row, i) = Line(i)
Next
Row += 1
Loop
End Using

Start reading massive text file from the end

I would ask if you could give me some alternatives in my problems.
basically I'm reading a .txt log file averaging to 8 million lines. Around 600megs of pure raw txt file.
I'm currently using streamreader to do 2 passes on those 8 million lines doing sorting and filtering important parts in the log file, but to do so, My computer is taking ~50sec to do 1 complete run.
One way that I can optimize this is to make the first pass to start reading at the end because the most important data is located approximately at the final 200k line(s) . Unfortunately, I searched and streamreader can't do this. Any ideas to do this?
Some general restriction
# of lines varies
size of file varies
location of important data varies but approx at the final 200k line
Here's the loop code for the first pass of the log file just to give you an idea
Do Until sr.EndOfStream = True 'Read whole File
Dim streambuff As String = sr.ReadLine 'Array to Store CombatLogNames
Dim CombatLogNames() As String
Dim searcher As String
If streambuff.Contains("CombatLogNames flags:0x1") Then 'Keyword to Filter CombatLogNames Packets in the .txt
Dim check As String = streambuff 'Duplicate of the Line being read
Dim index1 As Char = check.Substring(check.IndexOf("(") + 1) '
Dim index2 As Char = check.Substring(check.IndexOf("(") + 2) 'Used to bypass the first CombatLogNames packet that contain only 1 entry
If (check.IndexOf("(") <> -1 And index1 <> "" And index2 <> " ") Then 'Stricter Filters for CombatLogNames
Dim endCLN As Integer = 0 'Signifies the end of CombatLogNames Packet
Dim x As Integer = 0 'Counter for array
While (endCLN = 0 And streambuff <> "---- CNETMsg_Tick") 'Loops until the end keyword for CombatLogNames is seen
streambuff = sr.ReadLine 'Reads a new line to flush out "CombatLogNames flags:0x1" which is unneeded
If ((streambuff.Contains("---- CNETMsg_Tick") = True) Or (streambuff.Contains("ResponseKeys flags:0x0 ") = True)) Then
endCLN = 1 'Value change to determine end of CombatLogName packet
Else
ReDim Preserve CombatLogNames(x) 'Resizes the array while preserving the values
searcher = streambuff.Trim.Remove(streambuff.IndexOf("(") - 5).Remove(0, _
streambuff.Trim.Remove(streambuff.IndexOf("(")).IndexOf("'")) 'Additional filtering to get only valuable data
CombatLogNames(x) = search(searcher)
x += 1 '+1 to Array counter
End If
End While
Else
'MsgBox("Something went wrong, Flame the coder of this program!!") 'Bug Testing code that is disabled
End If
Else
End If
If (sr.EndOfStream = True) Then
ReDim GlobalArr(CombatLogNames.Length - 1) 'Resizing the Global array to prime it for copying data
Array.Copy(CombatLogNames, GlobalArr, CombatLogNames.Length) 'Just copying the array to make it global
End If
Loop
You CAN set the BaseStream to the desired reading position, you just cant set it to a specfic LINE (because counting lines requires to read the complete file)
Using sw As New StreamWriter("foo.txt", False, System.Text.Encoding.ASCII)
For i = 1 To 100
sw.WriteLine("the quick brown fox jumps ovr the lazy dog")
Next
End Using
Using sr As New StreamReader("foo.txt", System.Text.Encoding.ASCII)
sr.BaseStream.Seek(-100, SeekOrigin.End)
Dim garbage = sr.ReadLine ' can not use, because very likely not a COMPLETE line
While Not sr.EndOfStream
Dim line = sr.ReadLine
Console.WriteLine(line)
End While
End Using
For any later read attempt on the same file, you could simply save the final position (of the basestream) and on the next read to advance to that position before you start reading lines.
What worked for me was skipping first 4M lines (just a simple if counter > 4M surrounding everything inside the loop), and then adding background workers that did the filtering, and if important added the line to an array, while main thread continued reading the lines. This saved about third of the time at the end of a day.

Change just one line in a text file?

I have a text file with the format:
(title,price,id#)
CD1,11.00,111111
CD2,12.00,222222
CD3,13.00,333333
CD4,14.00,444444
CD5,15.00,555555
CD6,16.00,666666
What is the best way to go change the price of the appropriate CD if I'm given the id# and new price?
I'm sure it has something do to with getting the line and splitting it, but I'm not sure how I edit just one line and not mess up the whole file.
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). For such a small file it's probably the easiest to change the line in memory and then rewrite all to the file:
Dim idToFind = "444444"
Dim newPrice = "100"
Dim lines = IO.File.ReadAllLines(path)
For i = 0 To lines.Length - 1
Dim line = lines(i)
Dim fields = line.Split(","c)
If fields.Length > 2 Then
Dim id = fields(2)
If id = idToFind Then
Dim title = fields(0)
lines(i) = String.Format("{0},{1},{2}", title, newPrice, id)
Exit For
End If
End If
Next
IO.File.WriteAllLInes(path, lines)
Okay, now we know it's a short file, life becomes much easier:
Load the file into an array of lines using File.ReadAllLines
Find the right line using string.Split to split each line into the constituent parts, and check the ID.
When you've found the right line, replace it with the complete new line
Write the file back with File.WriteAllLines
That should be enough to get you going.
If its just a file with like 25 lines, you could do a simple input-transform-output routine and update the price per line.
Something like this (Using Streamreader / writer ).
Sub UpdatePrice(ByVal pricesToUpdate As Dictionary(Of Integer, String), ByVal inputPath As String)
If Not IO.File.Exists(inputPath) Then Return
Try
Using inputStream = New IO.StreamReader(inputPath, System.Text.Encoding.UTF8, True)
Using outputStream = New IO.StreamWriter(inputPath + ".tmp", False, System.Text.Encoding.UTF8)
While Not inputStream.EndOfStream
Dim inputLine = inputStream.ReadLine
Dim content = inputLine.Split(","c)
If Not content.Length >= 3 Then
outputStream.WriteLine(inputLine)
Continue While
End If
Dim id As Integer
If Not Integer.TryParse(content(2), id) Then
outputStream.WriteLine(inputLine)
Continue While
End If
If Not pricesToUpdate.ContainsKey(id) Then
outputStream.WriteLine(inputLine)
Continue While
End If
content(1) = pricesToUpdate(id)
outputStream.WriteLine(String.Join(",", {content(0), content(1), content(2)}))
End While
End Using
End Using
If IO.File.Exists(inputPath + ".tmp") Then
IO.File.Delete(inputPath)
IO.File.Move(inputPath + ".tmp", inputPath)
End If
Catch ex As IO.IOException
If IO.File.Exists(inputPath + ".tmp") Then IO.File.Delete(inputPath + ".tmp")
End Try
End Sub