How can I ignore a new line character when reading CSV file in VB.NET? - vb.net

I wrote a utility in VB.NET that reads an input CSV file, does some processing (specifically it ignores the first 5 lines of the input file and replaces them with a header row saved in another file) and writes the information from the input file into a new output CSV file.
Where my program fails is when the input data includes new line characters within one column value within the CSV.
I would like to ignore the new line character within a CSV data row when I load it into my string array.
Here is my code (its embedded in a form)
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
Dim incsvPath = strFileName
Dim outcsvPath = fi.DirectoryName & "\" & outfilename
Dim headerPath = fi.DirectoryName & "\ACTIVITY_HISTORY_HEADER.csv"
Dim fileP As String = incsvPath
Dim fileheader As String = headerPath
Dim CSVheaderIn As New ArrayList
Dim CSVlinesIn As New ArrayList
Dim CSVout As New List(Of String)
CSVheaderIn.AddRange(IO.File.ReadAllLines(fileheader))
CSVlinesIn.AddRange(IO.File.ReadAllLines(fileP))
messageTB.AppendText(vbCrLf & vbCrLf)
For Each line As String In CSVheaderIn
Dim nameANDnumber As String() = line.Split(",")
messageTB.AppendText("csv file header row = " & line & vbCrLf & vbCrLf & "csv file contents follow ..." & vbCrLf)
CSVout.Add(line)
Next
Dim mySubAL As ArrayList = CSVlinesIn.GetRange(5, CSVlinesIn.Count - 5)
For Each line As String In mySubAL 'CSVlinesIn
messageTB.AppendText(line & vbCrLf)
CSVout.Add(line)
Next
IO.File.WriteAllLines(outcsvPath, CSVout.ToArray)
End Sub

This is fairly hard work actually; it'll be easier to use a library that knows how to read and write CSV with newlines in the data than roll your own - not saying you couldn't, but it's a wheel that has already been invented so why do it again?
I used Steve Hansen's Csv - right click your project in solution explorer, choose Manage Nuget Packages, click Browse, Search csv, install the right one
Imports System.Text
Imports Csv
Imports System.IO
Module Module1
Sub Main(args As String())
'open the headers file
Using hIn = File.OpenText("C:\temp\h.csv")
'setup instruction to the csv reader with headersabsent flag so we can get the first line as data
Dim hOptions = New CsvOptions With {.HeaderMode = HeaderMode.HeaderAbsent}
'take the first line into an array - these are our headers
Dim headers = CsvReader.Read(hIn, hOptions)(0).Values
'open the data file,
Using fIn = File.OpenText("C:\temp\a.csv")
'setup instruction for the reader to skip 5 rows, treat first row as data, and allow newlines in quoted fields
Dim fOptions = New CsvOptions With {.RowsToSkip = 5, .HeaderMode = HeaderMode.HeaderAbsent, .AllowNewLineInEnclosedFieldValues = True}
Using fOut = File.CreateText("C:\temp\a_out.csv")
'convert the ICsvLine rows in the reader to rows of String() that the writer will accept, and write them under the headers
CsvWriter.Write(fOut, headers, CsvReader.Read(fIn, fOptions).Select(Function(line) line.Values))
End Using
End Using
End Using
End Sub
End Module
You don't have to use this lib to read the headers; you could just file.ReadText().ReadLine().Split(","c) it
If you want to perform per-line processing on the elements, do this:
CsvWriter.Write(fOut, headers, CsvReader.Read(fIn, fOptions).Select(Function(line) ProcessLine(line.Values)))
...
Function ProcessLine(input As String()) As String()
'Note: If(input(8), "") returns input(8) unless it is nothing in which case "" is returned instead
If If(input(8), "").Length > 10 Then input(8) = input(8).Remove(10) 'Trim if over 10
If If(input(14), "").Length > 10 Then input(14) = input(14).Remove(10)
Return input 'Always return
End Function

Related

Text file split in blocks vb.net

I am trying to go through my text file and create a new file that will contain only the text I require. My current line looks like:
Car-1I
Colour-39
Cost-328
Dealer-28
Car-2
Colour-30
Cost-234
For each block of text I would like to read the first line, if the first line ends with an I, then read the next line, if that line contains a colour 39, then I would like to save the whole block of text to another file. If these two conditions aren't met, I dont want to save my values to the new text file.
Before anything about saving my values in classes are mentioned, these blocks of text can vary in size and values, so I dont always have a set range of values which is why i need to skip to the blank line
IO.File.WriteAllText("C:\Users\test2.txt", "") 'write to new file
Dim sKey As String
Dim sValue As Integer
For Each filterLine As String In File.ReadLines("C:\Users\test.txt")
sKey = Split(filterLine, ":")(0)
sValue = Split(filterLine, ":")(1)
If Not sValue.EndsWith("I") Then
ElseIf sValue.EndsWith("I") Then
End If
Next
Another method, using File.ReadLines to read lines of text from file. This method doesn't load all the text in memory, it reads from disc single lines of text, so it can also be useful when dealing with big files.
You could loop the IEnumerable collection it returns, but also use its GetEnumerator() method to control more directly when to move to the next line, or move more then one lines forward.
Its Enumerator.Current object returns the line of text currently read, Enumerator.MoveNext() moves to the next line.
A StringBuilder is used to store the strings when a match found. Strings are added to the StringBuilder object using its AppendLine() method.
This class is useful when dealing with strings that you need to store, compare and discard (or modify) quickly: since string are immutable, when you use String variables directly, especially in loops, you generate a whole lot of garbage that slows down any procedure quite a lot.
The blocks of text stored in the StringBuilder object are then written to a destination file using a StreamWriter with explicit encoding set to UTF-8 (writes the BOM). Its methods include asynchronous versions: WriteLine() can be replaced by awaitWriteLineAsync() to allow an async procedure.
Imports System.IO
Imports System.Text
Dim sourceFilePath = "<Path of the source file>"
Dim resultsFilePath = "<Path of the destination file>"
Dim sb As New StringBuilder()
Dim enumerator = File.ReadLines(sourceFilePath).GetEnumerator()
Using sWriter As New StreamWriter(resultsFilePath, False, Encoding.UTF8)
While enumerator.MoveNext()
If enumerator.Current.EndsWith("I") Then
sb.AppendLine(enumerator.Current)
enumerator.MoveNext()
If enumerator.Current.EndsWith("39") Then
While Not String.IsNullOrWhiteSpace(enumerator.Current)
sb.AppendLine(enumerator.Current)
enumerator.MoveNext()
End While
sWriter.WriteLine(sb.ToString())
End If
sb.Clear()
End If
End While
End Using
This will work:
Dim strFile As String = "c:\Test5\Source.txt"
Dim strOutFile As String = "c:\Test5\OutPut.txt"
Dim strOutData As String = ""
Dim SourceGroups As String() = Split(File.ReadAllText(strFile), vbCrLf + vbCrLf)
For Each sGroup As String In SourceGroups
Dim OneGroup() As String = Split(sGroup, vbCrLf)
If Strings.Right(OneGroup(0), 1) = "I" And (Strings.Right(OneGroup(1), 2) = "39") Then
If strOutData <> "" Then strOutData += (vbCrLf & vbCrLf)
strOutData += sGroup
End If
Next
File.WriteAllText(strOutFile, strOutData)
Something like this should work:
Dim base, i, c as Integer
Dim lines1$() = File.ReadLines("C:\Users\test.txt")
c = lines1.count
While i < c
if Len(RTrim(lines1(i))) Then
If Strings.Right(RTrim(lines1(i)), 1)="I" Then
base = i
i += 1
If Strings.Right(RTrim(lines1(i)), 2)="39" Then
While Len(RTrim(lines1(i))) 'skip to the next blank
i += 1
End While
' write lines1(from base to (i-1)) here
Else
While Len(RTrim(lines1(i)))
i += 1
End While
End If
Else
i += 1
End If
Else
i += 1
End If
End While

VB.Net Search for text and replace with file content

This is a follow on question to a post I made. Append one file into another file
I need to search the master document for entities "&CH1.sgm" to "&CH33.sgm",
mark where they are in the master document and replace the entity call with the matching file "Chapter1.sgm" found in "fnFiles". I can change the file names and entities to anything if that will help.
My code copies the text of a file and appends it to the bottom of the master_document.sgm. But now I need it to be more intelligent. Search the Master document for entity markers, then replace that entity marker with that file contents match. The file number and entity number match up. e.g.(&CH1; and Bld1_Ch1.sgm)
Private Sub btnImport_Click(sender As Object, e As EventArgs) Handles btnImport.Click
Dim searchDir As String = txtSGMFile.Text 'Input field from form
Dim masterFile = "Bld1_Master_Document.sgm"
Dim existingFileMaster = Path.Combine(searchDir, masterFile)
'Read all lines of the Master Document
Dim strMasterDoc = File.ReadAllText(existingFileMaster) '// add each line as String Array.
'?search strMasterDoc for entities &Ch1.sgm
'?replace entity name "&Ch1.sgm" with content of file "Bld1_Ch1.sgm" this content if found below
'? do I use a book mark? Replace function?
'Get all the sgm files in the directory specified
Dim fndFiles = Directory.GetFiles(searchDir, "*.sgm")
'Set up the regular expression you will make as the condition for the file
Dim rx = New Regex(".*_Ch\d\.sgm")
Dim ch1 = New Regex(".*_Ch[1]\.sgm")
'Use path.combine for concatenatin directory together
'Loop through each file found by the REGEX
For Each fileNo In fndFiles
If rx.IsMatch(fileNo) Then
If ch1.IsMatch(fileNo) Then
Dim result = Path.GetFileName(fileNo)
'Use path.combine for concatenatin directory together
Dim fileToCopy = Path.Combine(searchDir, result)
'This is the file we want to copy into MasterBuild but at specific location.
'match &ch1.sgm inside strMasterDoc
Dim fileContent = File.ReadAllText(fileToCopy)
'Search master file for entity match then append all content of fileContent
File.AppendAllText(existingFileMaster, fileContent)
MessageBox.Show("File Copied")
End If
End If
Next
Close()
End Sub
If I understand correctly (big if), you want to replace the the text of the abbreviated chapter name in the master file with the contents of the file it refers to at the spot where the abbreviation is found.
I made a class to handle the details.
Private Sub btnImport_Click(sender As Object, e As EventArgs) Handles btnImport.Click
'Add a FolderBrowseDialog to your form designer
FolderBrowserDialog1.ShowDialog()
Dim searchDir As String = FolderBrowserDialog1.SelectedPath
Dim existingFileMaster = Path.Combine(searchDir, "Bld1_Master_Document.sgm")
Dim lstFileChanges = CreateList(searchDir)
'The following method does NOT return an array of lines
Dim strMasterDoc = File.ReadAllText(existingFileMaster)
For Each fc In lstFileChanges
strMasterDoc = strMasterDoc.Replace(fc.OldString, fc.NewString)
Next
File.WriteAllText(existingFileMaster, strMasterDoc)
End Sub
Private Function CreateList(selectedPath As String) As List(Of FileChanges)
Dim lstFC As New List(Of FileChanges)
For i = 1 To lstFC.Count
Dim fc As New FileChanges
fc.OldString = $"&CH{i}.sgm"
fc.FileName = $"Chapter{i}.sgm"
fc.NewString = File.ReadAllText(Path.Combine(selectedPath, fc.FileName))
lstFC.Add(fc)
Next
Return lstFC
End Function
Public Class FileChanges
Public Property OldString As String '&CH1.sgm
Public Property FileName As String 'Chapter1.sgm
Public Property NewString As String 'Contents of Chapter1.sgm, the string to insert
End Class
Testing .Replace
Dim s As String = "The quick brown fox jumped over the lazy dogs."
s = s.Replace("fox", "foxes")
MessageBox.Show(s)

Count words in an external file using delimiter of a space

I want to calculate the number of words in a text file using a delimiter of a space (" "), however I am struggling.
Dim counter = 0
Dim delim = " "
Dim fields() As String
fields = Nothing
Dim line As String
line = Input
While (SR.EndOfStream)
line = SR.ReadLine()
End While
Console.WriteLine(vbLf & "Reading File.. ")
fields = line.Split(delim.ToCharArray())
For i = 0 To fields.Length
counter = counter + 1
Next
SR.Close()
Console.WriteLine(vbLf & "The word count is {0}", counter)
I do not know how to open the file and to get the do this, very confused; would like an explanation so I can edit and understand from it.
You're going to be reading a file as the source of the data, so let's create a variable to refer to its filename:
Dim srcFile = "C:\temp\twolines.txt"
As you have shown already, a variable is needed to hold the number of words found:
Dim counter = 0
To read from the file, a StreamReader will do the job. Now, we look at the documenation for it (yes, really) and notice that it has a Dispose method. That means that we have to explicitly dispose of it after we've used it to make sure that no system resources are tied up until the computer is next rebooted (e.g there could be a "memory leak"). Fortunately, there is the Using construct to take care of that for us:
Using sr As New StreamReader(srcFile)
And now we want to iterate over the content of the file line-by-line until the end of the file:
While Not sr.EndOfStream
Then we want to read a line and find how many items separated by spaces it has:
counter += sr.ReadLine().Split({" "c}, StringSplitOptions.RemoveEmptyEntries).Length
The += operator is like saying "add n to a" instead of saying "a = a + n". The {" "c} is a literal array of the character " "c. The c tells it that is a character and not a string of one character. The StringSplitOptions.RemoveEmptyEntries means that if there was text of "one two" then it would ignore the extra spaces.
So, if you were writing a console program, it might look like:
Imports System.IO
Module Module1
Sub Main()
Dim srcFile = "C:\temp\twolines.txt"
Dim counter = 0
Using sr As New StreamReader(srcFile)
While Not sr.EndOfStream
counter += sr.ReadLine().Split({" "c}, StringSplitOptions.RemoveEmptyEntries).Length
End While
End Using
Console.WriteLine(counter)
Console.ReadLine()
End Sub
End Module
Any embellishments such as writing out what the number represents or error checking are left up to you.
With Path.Combine you don't have to worry about where the slashes or back slashes go. You can get the path of special folders easily using the Environment class. The File class of System.IO is shared so you don't have to create an instance.
Public Sub Main()
Dim p As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments), "Chapters.txt")
Debug.Print(Environment.SpecialFolder.MyDocuments.ToString)
Dim count As Integer = GetCount(p)
Console.WriteLine(count)
Console.ReadKey()
End Sub
Private Function GetCount(Path As String) As Integer
Dim s = File.ReadAllText(Path)
Return s.Split().Length
End Function
Use Split function, then Directly get the length of result array and add 1 to it.

How to exclude selected text and extract the next text separated with " : "?

i'm in the making of machine readable dictionary for my native language which is Malay. i need to extract the Malay translation from the .txt file. In the .txt file, the example are like this:
aberration : aberasi
aberration function : rangkap aberasi
ablation : ablasi
ablative material : bahan ablasi
the left one are the terms in English and after the separator : is Malay.
What I would like to ask is how do i do when the word search is "aberration function" and i need to display only "rangkap aberasi"?
i tried to used
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim text As String = "dictionary.txt"
Dim word As String = "\b" & TextBox2.Text & "\b\s+(\w+)"
For Each a As Match In Regex.Matches(text, word, RegexOptions.IgnoreCase)
MsgBox(a.Groups(1).Value)
Next
End Sub
End Class
See more at: http://www.visual-basic-tutorials.com/get-the-next-word-after-a-specific-word-in-visual-basic.htm#sthash.w8E1Qb6l.dpuf
however, my problem is, this code above only display the next word after seperation which is "rangkap". while i need the whole text after separation ":" which might be more than 2 words.
here is my current code
Using reader As New StreamReader("D:\Dictionary of Engineering A only.txt")
While Not reader.EndOfStream
Dim line As String = reader.ReadLine()
If line.Contains(wordsearch.Text) Then
Edef.Text = line
Exit While
End If
End While
End Using
One way to solve this would be like this:
Using reader As New StreamReader("D:\Dictionary of Engineering A only.txt")
While Not reader.EndOfStream
Dim line As String = reader.ReadLine()
If line.Contains(wordsearch.Text) Then
dim lineParts() as String = line.split(":")
Edef.Text = lineParts(1)
Exit While
End If
End While
End Using
You check if the line contains the text, if so, you split the line by the ":" and take the 2nd part of it.

Why does my CSV Writer skips some lines? vb.net

I dont know why my CSV writer skips some lines. I tried to export today a gridview and saw it by coincindence. The grid has 250 rows. It writes 1-60 and then goes on at 140-250. 61-139 are missing in the excel table?
to the code: First I create a list whith all columns, because the writer method shall be able to also write only specific columns, but this is another button.
Then the grid and the list is given to the csv Writer.
Private Sub Button4_Click(sender As Object, e As EventArgs) Handles Button4.Click
Dim list As List(Of Integer) = New List(Of Integer)
For Each column As DataGridViewColumn In datagridview2.Columns
list.Add(column.Index)
Next
Module3.csvwriter(list, datagridview2)
MsgBox("exportieren beendet")
End Sub
The csv-Writer creates a csv-body-String for every datarow.
For each row, append every columnvalue to body if the columnindex is in the list.
The best is, that I have seen how the method build row 61, but then it is missing in the csv :/
Public Sub csvwriter(list As List(Of Integer), grid As DataGridView)
Dim body As String = ""
Dim myWriter As New StreamWriter(Importverzeichnis.savecsv, True, System.Text.Encoding.Default)
Dim i As Integer
For i = 0 To grid.Rows.Count - 1
For ix = 0 To grid.Columns.Count - 1
If list.Contains(ix) Then
If grid.Rows(i).Cells(ix).Value IsNot Nothing Then
body = body + grid.Rows(i).Cells(ix).Value.ToString + ";"
Else
body = body + ";"
End If
End If
Next
myWriter.WriteLine(body)
body = ""
Next
myWriter.Close()
End Sub
Has anyone an idea? Do I overlook something?
I found an answer.
1. The csv file is complete. every row is in there.
The problem was or is, that in my datarows is a sign " which escapes the delimiter ; sign.
To solve the problem I save the file now as .txt file,
because then the user gets asked everytime when he opens the file, which one is the delimiter, which is the escaping sign.. this way the user is forced to put in the right settings, while a .csv file uses standard configurations..