Split string from file - vb.net

I have a file in which the name of a book and its author is on each line. (EX: "Douglas Adams,The Hitchhiker's Guide To The Galaxy" is one line of the file). I can read each line into a temporary string, but when I split it at the comma to put the author and book in different arrays, it won't work.
Here is my code:
objReader = New StreamReader(AppPath() + "books\books.txt")
i = 1
Dim temp() As String
Dim tempStr As String
Do While objReader.Peek() <> -1
tempStr = objReader.ReadLine()
temp = tempStr.Split(New Char() {","c})
temp(0) = authors(i)
temp(1) = books(i)
i = i + 1
Loop
I already initialized objReader and i earlier, and I imported System.IO, too.
I have tried to change the delimiters to semicolons, slashes, and backslashes in both the code and the file, but it doesn't work. I can confirm the file loads correctly.

You have to put the string in the arrays, you're doing it the other way around:
authors(i) = temp(0)
books(i) = temp(1)

Related

Select text between key words

This is a follow on question to Select block of text and merge into new document
I have a SGM document with comments added and comments in my sgm file. I need to extract the strings in between the start/stop comments so I can put them in a temporary file for modification. Right now it's selecting everything including the start/stop comments and data outside of the start/stop comments.
Dim DirFolder As String = txtDirectory.Text
Dim Directory As New IO.DirectoryInfo(DirFolder)
Dim allFiles As IO.FileInfo() = Directory.GetFiles("*.sgm")
Dim singleFile As IO.FileInfo
Dim Prefix As String
Dim newMasterFilePath As String
Dim masterFileName As String
Dim newMasterFileName As String
Dim startMark As String = "<!--#start#-->"
Dim stopMark As String = "<!--#stop#-->"
searchDir = txtDirectory.Text
Prefix = txtBxUnique.Text
For Each singleFile In allFiles
If File.Exists(singleFile.FullName) Then
Dim fileName = singleFile.FullName
Debug.Print("file name : " & fileName)
' A backup first
Dim backup As String = fileName & ".bak"
File.Copy(fileName, backup, True)
' Load lines from the source file in memory
Dim lines() As String = File.ReadAllLines(backup)
' Now re-create the source file and start writing lines inside a block
' Evaluate all the lines in the file.
' Set insideBlock to false
Dim insideBlock As Boolean = False
Using sw As StreamWriter = File.CreateText(backup)
For Each line As String In lines
If line = startMark Then
' start writing at the line below
insideBlock = True
' Evaluate if the next line is <!Stop>
ElseIf line = stopMark Then
' Stop writing
insideBlock = False
ElseIf insideBlock = True Then
' Write the current line in the block
sw.WriteLine(line)
End If
Next
End Using
End If
Next
This is the example text to test on.
<chapter id="Chapter_Overview"> <?Pub Lcl _divid="500" _parentid="0">
<title>Learning how to gather data</title>
<!--#start#-->
<section>
<title>ALTERNATE MISSION EQUIPMENT</title>
<para0 verdate="18 Jan 2019" verstatus="ver">
<title>
<applicabil applicref="xxx">
</applicabil>Three-Button Trackball Mouse</title>
<para>This is the example to grab all text between start and stop comments.
</para></para0>
</section>
<!--#stop#-->
Things to note: the start and stop comments ALWAYS fall on a new line, a document can have multiple start/stop sections
I thought maybe using a regex on this
(<section>[\w+\w]+.*?<\/section>)\R(<\?Pub _gtinsert.*>\R<pgbrk pgnum.*?>\R<\?Pub /_gtinsert>)*
Or maybe use IndexOf and LastIndexOf, but I couldn't get that working.
You can read the entire file and split it into an array using the string array of {"<!--#start#-->", "<!--#stop#-->"} to split, into this
Element 0: Text before "<!--#start#-->"
Element 1: Text between "<!--#start#-->" and "<!--#stop#-->"
Element 2: Text after "<!--#stop#-->"
and take element 1. Then write it to your backup.
Dim text = File.ReadAllText(backup).Split({startMark, stopMark}, StringSplitOptions.RemoveEmptyEntries)(1)
Using sw As StreamWriter = File.CreateText(backup)
sw.Write(text)
End Using
Edit to address comment
I did make the original code a little compact. It can be expanded out into the following, which allows you to add some validation
Dim text = File.ReadAllText(backup)
Dim split = text.Split({startMark, stopMark}, StringSplitOptions.RemoveEmptyEntries)
If split.Count() <> 3 Then Throw New Exception("File didn't contain one or more delimiters.")
text = split(1)
Using sw As StreamWriter = File.CreateText(backup)
sw.Write(text)
End Using

What is the simplest way to get the second item of each row in a string file

I have a String file with 8 items (separated by commas) in each row, e.g., CA,23456,aName,aType,anotherName,aWord,secondword,number. I want to create a new string of items consisting of the 2nd item (an Integer) of each row of the original file. I know there are many ways to do this but someone out there knows how to do it with very few lines of code, which is what I am looking for. I prefer not to use a parser.
The way to show what I have tried is to look at the code below.
Dim sn2 As String = ""
Dim sn2S As String = ""
Using readFile As New StreamReader(newFile1)
Do While readFile.Peek() <> -1
sn2S = readFile.ReadLine(1)
sn2 = sn2 & sn2S & ","
Loop
End Using
The code returns the second character of each row not the second item. What I hope to get is a string that looks like: 123,1345,4325,3321,3456,3211 etc. Where each number is the second item in each row of the original file.
You could split it up by cells
Dim row As String = "CA,23456,aName,aType,anotherName,aWord,secondword,number"
Dim cells() As String = row.Split(",")
Dim cellValue As String = cells(1)
But in your case, I would just do a search and Substring by the index of the delimiter.
Dim startPosition As Integer = row.IndexOf(",") + 1
Dim endPosition As Integer = row.IndexOf(",", startPosition)
Dim cellValue As String = row.Substring(startPosition, endPosition - startPosition)
If you have the whole file in memory, there could be some regex that could do the job with one pass.
As for this line
sn2 = sn2 & sn2S & ","
You might want to check at doing a join or using stringbuilder.
You could try
Dim sn2 As String = ""
Dim sn2S(7) As String = ""
Using readFile As New StreamReader(newFile1)
Do While readFile.Peek() <> -1
Array.Clear(sn25,0,sn25.Length)
sn2S = readFile.ReadLine(1).Split(",")
sn2 = sn2 & sn2S(1) & ","
Loop
End Using
In one line
Dim sn2 = String.Join(",", File.ReadAllLines(newFile1).Select(Function(s) s.Split(","c)(1)))
From the inside-out:
File.ReadAllLines(newFile1) splits the file into lines and results in a string array holding those lines, which is fed into...
...Select(Function(s) s.Split(","c)(1)) which operates on each line by splitting the line by comma s.Split(","c) and then indexing the resulting array (1) to return the second (zero-based) element. This is fed into...
String.Join(",", ... ) which takes those second elements and joins then together with comma.

Sum a double after split in visual basic

I am pulling data from a txt file that looks like this
Name,123
Dim total As Double = 0
For Each line As String In IO.File.ReadAllLines("file path")
'Dim t As String() = line.Split(New Char() {","c})
Dim parts As String() = line.Split(New Char() {","c})
Dim firstPart As String = parts(1)
total += Double.Parse(firstPart)
it is saying that parts(1) is out of range. Any help wold be appreciated
Clearly one of the lines either doesn't contain a separator, or it contains no text at all.
You should add a check to ignore such lines:
If parts.Length < 2 Then Continue For 'There are not enough parts. Continue with the next line instead.
Dim firstPart As String = parts(1)
total += Double.Parse(firstPart)
- The Continue Statement - MSDN
First part would be parts(0) since arrays are enumerated 0 to N.. but whatever.
You should also test for blank lines if there can be any. The latter is often an issue with text files that end on a new line.

VB read/write 5 gb text files

I have an application that reads a 5gb text file line by line and converts double quoted strings that are comma delimited to pipe delimited format.
i.e. "Smith, John","Snow, John" --> Smith, John|Snow, John
I have provided my code below. My question is: Is there a more efficient way of processing large files?
Dim fName As String = "C:\LargeFile.csv"
Dim wrtFile As String = "C:\ProcessedFile.txt"
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim line As String = ""
Do While strRead.Peek <> -1
line = strRead.ReadLine
Dim pattern As String = "(,)(?=(?:[^""]|""[^""]*"")*$)"
Dim replacement As String = "|"
Dim regEx As New Regex(pattern)
Dim newLine As String = regEx.Replace(line, replacement)
newLine = newLine.Replace(Chr(34), "")
strWrite.WriteLine(newLine)
Loop
strWrite.Close()
UPDATED CODE
Dim fName As String = "C:\LargeFile.csv"
Dim wrtFile As String = "C:\ProcessedFile.txt"
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim line As String = ""
Do While strRead.Peek <> -1
line = strRead.ReadLine
line = line.Replace(Chr(34) + Chr(44) + Chr(34), "|")
line = line.Replace(Chr(34), "")
strWrite.WriteLine(line)
Loop
strWrite.Close()
I tested your code and attempted to make a speed improvement by accumulating output lines into a StringBuilder. I also moved the regex declaration outside the loop.
When that did not work, I examined the CPU usage and disk I/O with Windows Process Monitor and it turned out that the bottleneck is the CPU (even when using an HDD instead of an SSD).
That led me to try an alternative method for modifying the text: if all you need to do is replace "," with | and remove any remaining double-quotes, then
newLine = line.Replace(""",""", "|").Replace("""", "")
turns out to be much faster (roughly fourfold in my testing) than using a regex.
(Further improvement might be possible with multi-threading, as #Werdna suggested, as long as more than one processor is available and you can coordinate writing back the modified data in the correct order.)

Combine 2 lines, sometimes, in large text files

I need to do search and replace across 2 lines of a large ascii text file, where this may occur n times (n>1000) in random places. A text file looks like this:
....
StringVariable='
my contents'
.....
and I want it to read:
....
StringVariable='my contents'
....
For small files, I use AllText, which works fine for small files:
My.Computer.FileSystem.WriteAllText(MyInputFile, My.Computer.FileSystem.ReadAllText(MyOutputFile).Replace("='" & vbCrLf, "='"), False)
For large ones, AllText crashes with out of memory error. I see posts to use ReadLine and WriteLine, and how to test strings for characters, but I am missing how to combine multiple lines 'n' times without losing my place in the file. I guess I could split the large file into many small files carefully to allow use of AllText, and then recombine, but that seems crude. Is there a better way?
I see how to fix the case listed above, but I have other cases (e.g, 2 CR's after a specific string) and struggling to resolve for flexible case where you want to replace a multi-line string with a variable length multi-line string.
Here is the code I used for the initial case above:
Private Sub RemoveCRBefore(ByVal Infile As String, ByVal Outfile As String, ByVal LookedFor As String)
Dim Line0 As String = ""
Dim LinedUp As String = ""
Dim LookLong As Integer = LookedFor.Length
Dim FirstLine As Boolean = True
Using sr As StreamReader = New StreamReader(Infile)
Using sw = System.IO.File.CreateText(Outfile)
Dim Line1 As String = sr.ReadLine
Do While (Not Line1 Is Nothing)
If Line1.Length >= LookLong Then
If LookedFor = Line1.Substring(0, LookLong) And Not FirstLine Then
LinedUp = Line0.Replace(vbCrLf, "") & Line1
Line0 = LinedUp
FirstLine = True
Else
If FirstLine = False Then sw.WriteLine(Line0)
Line0 = Line1
End If
Else
sw.WriteLine(Line0)
Line0 = Line1
End If
Line1 = sr.ReadLine
FirstLine = False
Loop
sw.WriteLine(Line0)
End Using
End Using
End Sub