VB.Net Replacing Specific Values in a Large Text File - vb.net

I have some large csv files (1.5gb each) where I need to replace specific values. The method I'm currently using is terribly slow and I'm fairly certain that there should be a way to speed this up but I'm just not experienced enough to know what I should be doing. This is my first post and I tried searching through to find something relevant but didn't come across anything. Any help would be appreciated.
My other thought would be to break the file into chunks so that I can read the entire thing into memory, do all of the replacements there and then output to a consolidated file. I tried this but the way I did it actually ended up seeming slower than my current method.
Thanks!
Sub Main()
Dim fName As String = "2009.csv"
Dim wrtFile As String = "2009.1.csv"
Dim lRead
Dim lwrite As String
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim bulkWrite As String
bulkWrite = ""
Do While strRead.Peek <> -1
lRead = Split(strRead.ReadLine(), ",")
If lRead(9) = "5MM+" Then lRead(9) = "5000000"
If lRead(9) = "1MM+" Then lRead(9) = "1000000"
lwrite = ""
For i = LBound(lRead) To UBound(lRead)
lwrite = lwrite & lRead(i) & ","
Next
strWrite.WriteLine(lwrite)
Loop
strRead.Close()
strWrite.Close()
End Sub

You are splitting and the combining, which can take some time.
Why not just read the line of text. Then replace any occurance of "5MM+" and "1MM+" with the approiate value and then write the line.
Do While ...
s = strRead.ReadLine();
s = s.Replace("5MM+", "5000000")
s = s.Replace("1MM+", "1000000")
strWrite(s);
Loop

Related

Array Out of bounds error VB

Sorry for the terrible wording on my last question, I was half asleep and it was midnight. This time I'll try to be more clear.
I'm currently writing some code for a mini barcode scanner and stock manager program. I've got the input and everything sorted out, but there is a problem with my arrays.
I'm currently trying to extract the contents of the stock file and sort them out into product tables.
This is my current code for getting the data:
Using fs As StreamReader = New StreamReader("The File Path (Is private)")
Dim line As String = "ERROR"
line = fs.ReadLine()
While line <> Nothing
Dim pos As Integer = 0
Dim split(3) As String
pos = products.Length
split = line.Split("|")
productCodes(productCodes.Length) = split(0)
products(products.Length, 0) = split(1)
products(products.Length, 1) = split(2)
products(products.Length, 2) = split(3)
line = fs.ReadLine()
End While
End Using
I have made sure that the file path does, in fact, go to the file. I have looked through debug to find that all the data is going through into my "split" table. The error throws as soon as I start trying to transfer the data.
This is where I declare the two tables being used:
Dim productCodes() As String = {}
Dim products(,) As Object = {}
Can somebody please explain why this is happening?
Thanks in advance
~Hydro
By declaring the arrays like you did:
Dim productCodes() As String = {}
Dim products(,) As Object = {}
You are assigning size 0 to all your arrays, so during your loop, it will eventually try to access a position that haven't been previously declared to the compiler. It is the same as declaring an array of size 10 Dim MyArray(10) and try to access the position 11 MyArray(11) = something.
You should either declare it with a proper size, or redim it during execution time:
Dim productCodes(10) As String
or
Dim productCodes() As String
Dim Products(,) As String
Dim Position as integer = 0
'code here
While line <> Nothing
Redim Preserve productCodes(Position)
Redim Preserve products(2,Position)
Dim split(3) As String
pos = products.Length
split = line.Split("|")
productCodes(Position) = split(0)
products(0,Position) = split(1)
products(1,Position) = split(2)
products(2,Position) = split(3)
line = fs.ReadLine()
Position+=1
End While

VB read/write 5 gb text files

I have an application that reads a 5gb text file line by line and converts double quoted strings that are comma delimited to pipe delimited format.
i.e. "Smith, John","Snow, John" --> Smith, John|Snow, John
I have provided my code below. My question is: Is there a more efficient way of processing large files?
Dim fName As String = "C:\LargeFile.csv"
Dim wrtFile As String = "C:\ProcessedFile.txt"
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim line As String = ""
Do While strRead.Peek <> -1
line = strRead.ReadLine
Dim pattern As String = "(,)(?=(?:[^""]|""[^""]*"")*$)"
Dim replacement As String = "|"
Dim regEx As New Regex(pattern)
Dim newLine As String = regEx.Replace(line, replacement)
newLine = newLine.Replace(Chr(34), "")
strWrite.WriteLine(newLine)
Loop
strWrite.Close()
UPDATED CODE
Dim fName As String = "C:\LargeFile.csv"
Dim wrtFile As String = "C:\ProcessedFile.txt"
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim line As String = ""
Do While strRead.Peek <> -1
line = strRead.ReadLine
line = line.Replace(Chr(34) + Chr(44) + Chr(34), "|")
line = line.Replace(Chr(34), "")
strWrite.WriteLine(line)
Loop
strWrite.Close()
I tested your code and attempted to make a speed improvement by accumulating output lines into a StringBuilder. I also moved the regex declaration outside the loop.
When that did not work, I examined the CPU usage and disk I/O with Windows Process Monitor and it turned out that the bottleneck is the CPU (even when using an HDD instead of an SSD).
That led me to try an alternative method for modifying the text: if all you need to do is replace "," with | and remove any remaining double-quotes, then
newLine = line.Replace(""",""", "|").Replace("""", "")
turns out to be much faster (roughly fourfold in my testing) than using a regex.
(Further improvement might be possible with multi-threading, as #Werdna suggested, as long as more than one processor is available and you can coordinate writing back the modified data in the correct order.)

Syntax Out Of Range Unhandled

I'm a bit of a newbie so any advice would be great. I have a program
that opens a CSV, and then saves it as a csv with a different name. there will be a set of rules to change fields but haven't got that far yet.
when I run this on a small csv file (about 4 columns and rows) it works fine, but with a larger file, it fails with the error above. i'm sure its something daft but I I'm at a loss.
Thanks,
Dean
Dim FileName = tbOpen.Text
Dim fileout = tbSave.Text
Dim lines = File.ReadAllLines(FileName)
Dim output As New List(Of String)
For Each line In lines
Dim fields = line.Split(","c)
If fields(1) = "" Then 'This is where the error is triggered
fields(1) = "Norman"
End If
If fields(3) = "" Then
fields(3) = "Blue Leather"
End If
If fields(4) = "" Then
fields(3) = "Interlined"
End If
output.Add(String.Join(","c, fields))
Next
File.WriteAllLines(fileout, output)
Try
Dim a As String = My.Computer.FileSystem.ReadAllText(tbSave.Text)
Dim b As String() = a.Split(vbNewLine)
ListBox2.Items.AddRange(b)
Catch ex As Exception
MsgBox("error")
End Try
Keep in mind that arrays start a index 0. VB is notorious for stretching this concept. Normally, when you declare an array with 10 elements the indexes would be from 0 - 9. With VB, on the other hand, the indexes will be from 0 - 10 which will actually give you 11 elements.

for loop : string & number without keep adding &

I'm learning for loop and I cannot get this problem fixed.
The problems are in the following codes.
dim rt as integer = 2
dim i As Integer = 0
dim currentpg as string = "http://homepg.com/"
For i = 0 To rt
currentpg = currentpg & "?pg=" & i
messagebox.show(currentpg)
next
'I hoped to get the following results
http://homepg.com/?pg=0
http://homepg.com/?pg=1
http://homepg.com/?pg=2
'but instead I'm getting this
http://homepg.com/?pg=0
http://homepg.com/?pg=0?pg=0
http://homepg.com/?pg=0?pg=0?pg=0
Please help me
Thank you.
You probably need something like this:
Dim basepg as string = "http://homepg.com/"
For i = 0 To rt
Dim currentpg As String = basepg & "?pg=" & i
messagebox.show(currentpg)
Next
Although a proper approach would be to accumulate results into a List(Of String), and then display in a messagebox once (or a textbox/file, if too many results). You don't want to bug user for every URL (what if there are 100 of them?). They would get tired of clicking OK.
First of all, you went wrong while copying the output of the buggy code. Here is the real one.
http://homepg.com/?pg=0
http://homepg.com/?pg=0?pg=1
http://homepg.com/?pg=0?pg=1?pg=2
It does not work because currentpg should be a constant but it is changed on each iteration.
Do not set, just get.
MessageBox.Show(currentpg & "?pg=" & i)
Or you can use another variable to make it more readable.
Dim newpg As String = currentpg & "?pg=" & i
MessageBox.Show(newpg)
Also, your code is inefficient. I suggest you to change it like this.
Dim iterations As Integer = 2
Dim prefix As String = "http://homepg.com/?pg="
For index As Integer = 0 To iterations
MessageBox.Show(prefix & index)
Next

Creating Fixed Width files from strings

I have searched high and low on the internet and I can't find a straight answer to this !
I have a file that has approx 100,000 characters in one long line.
I need to read this file in and write it out again in its entirety, in lines 102 character long ending with VbCrLf. There are no delimiters.
I thought there were a number of ways to tackle issues like this in VB Script... but
apparently not !
Can anyone please provide me with a pointer ?
Here's something (off the top of my head - untested!) that should get you started.
Const ForReading = 1
Const ForWriting = 2
Dim sNewLine
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("OldFile.txt", ForReading) ' Your input file
Set tsOut = fso.OpenTextFile("NewFile.txt", ForWriting) ' New (output) file
While Not tsIn.AtEndOfStream ' While there is still text
sNewLine = tsIn.Read(102) ' Read 120 characters
tsOut.Write sNewLine & vbCrLf ' Write out to new file + CR/LF
Wend ' Loop to repeat
tsIn.Close
tsOut.Close
I won't cover the reading of files, since that is stuff you can find everywhere. And since it's been years I've coded in vb or vbscript, I hope that .net code will suffice.
pseudo: read line from file, put it in for example a string (performance issues anyone?).
A simple algorithm would be and this might have performance issues (multithreading, parallel could be a solution):
Public Sub foo()
Dim strLine As String = "foo²"
Dim strLines As List(Of String) = New List(Of String)
Dim nrChars = strLine.ToCharArray.Count
Dim iterations = nrChars / 102
For i As Integer = 0 To iterations - 1
strLines.Add(strLine.Substring(0, 102))
strLine = strLine.Substring(103)
Next
'save it to file
End Sub